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Preface 



These are the proceedings of the Sixth International Conference on Logic Pro- 
gramming and Nonmonotonic Reasoning (LPNMR2001). The conference was 
held in Vienna from 17th to 19th of September, 2001. It was collocated with the 
Joint German/ Austrian Conference on Artificial Intelligence (24th German/9th 
Austrian Conference on Artificial Intelligence), KI2001. 

LPNMR conferences aim to promote research in logic-based programming 
languages, database systems, nonmonotonic reasoning, and knowledge repre- 
sentation. LPNMR 2001 was the sixth conference in the series. The previous 
meetings were held in Washington, DC, in 1991, in Lisbon, Portugal, in 1993, in 
Lexington, Kentucky, in 1995, in Dagstuhl, Germany, in 1997, and in El Paso, 
Texas, in 1999. 

The technical program of LPNMR 2001 was comprised of five invited talks 
that were given by Jiirgen Dix, Georg Gottlob, Phokion Kolaitis, Maurizio Lenz- 
erini, and Chiaki Sakama. It also contained 23 technical presentations selected 
by the program committee during a rigorous review process. Finally, as a part 
of the technical program, the conference featured a special session comprised of 
nine presentations and demonstrations of implemented nonmonotonic reasoning 
systems. All these contributions are included in the proceedings. 

Many individuals worked for the success of the conference. Special thanks 
are due to all members of the program committee and to additional reviewers 
for their efforts to produce fair and thorough evaluations of submitted papers. 
Furthermore, we would like to thank the members of the Knowledge Based 
Systems Group of the Vienna University of Technology, which took care of the 
local organization. We particularly appreciated the never tiring effort of Elfriede 
Nedoma, secretary to the group. We would also like to thank Gerd Brewka for 
his supportive role in arranging the collocation of the conference with KI2001. 
Last, but not least, we thank the sponsoring institutions for their generosity. 
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A Computational Logic Approach to 
Heterogenous Agent Systems 



Jiirgen Dix * 

The University of Manchester, Dept, of CS 
Oxford Road, Manchester M13 9PL, UK 
dixScs .man. ac .uk 
http : //www. CS .man. ac .uk/~ jdix 



Abstract. I report about a particular approach to heterogenous agent 
systems, IMPACT, which is strongly related to computational logic. The 
underlying methods and techniques stem from both non-monotonic rea- 
soning and logic programming. I present three recent extensions to il- 
lustrate the generality and usefuhress of the approach: (1) incorporating 
planning, (2) uncertain (probabilistic) reasoning, and (3) reducing the 
load of serving multiple requests. While (1) illustrates how easy it is to 
incorporate hierachical task networks into IMPACT, (2) makes heavily 
use of annotated logic programming and (3) is strongly related to classi- 
cal first-order reasoning. This paper is a high-level description of (1)"(3), 
More detailed expositions can be found in [1,2, 3, 4] from which most parts 
of this paper are taken. 



1 The Basic Framework 

The IMPACT project (http : //www. cs . umd. edu/projects/impact) aims at de- 
veloping a powerful multi agent system, which (1) is able to deal with heteroge- 
nous and distributed data, (2) can be realized on top of arbitrary legacy code, 
but yet (3) is built on a clear foundational bases and (4) scales up for realistic 
applications. 

In this article I am pointing to some recent extensions of the basic frame- 
work (which has been implemented and is running) that show very clearly the 
strong links to computational logic, even though IMPACT’S implementation is 
not realized on top of a logic related procedural mechanism. 

To get a bird’s eye view of IMPACT, here are the most important features: 

— Each IMPACT agent has certain actions available. Agents act in their en- 
vironment according to their agent program and a well defined semantics 
determining which of the actions the agent should execute. 

— Each agent continually undergoes the following cycle: 

* The work I am reporting has been done with many colleagues, notably Th. Eiter, 
S. Kraus, K. Munoz-Avila, M. Nanni, D. Nan, F. Ozcan, T.J. Rogers, R. Ross 
and, last but not least, V.S Subrahmanian. It resulted in a variety of papers and I 
gratefully acknowledge their support. 
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(c) Springer-Verlag Berlin Heidelberg 2001 
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IMPACT Architecture 




Fig. 1. SHOP as a planning agent in IMPACT 



(1) Get messages by other agents. This changes the state of the agent. 

(2) Determine (based on its program, its semantics and its state) for each 
action its status (permitted, obliged, forbidden, . . . ). The agent ends up 
with a set of status atoms. 

(3) Based on a notion of concurrency, determine the actions that can be 
executed and update the state accordingly. 

— IMPACT Agents are built on top of arbitrary software code {Legacy Data). 

— A methodology for transforming arbitrary software (legacy code) into an 

agent has been developed. 

A complete description of all these notions is out of scope of this paper and we 
refer to [3] for a detailed presentation. 

Before explaining an agent in more detail, we need to make some comments 
about the general architecture. In IMPACT agents communicate with other 
agents through the network. Not only can they send out (and receive) messages 
from other agents, they can also ask the server to find out about services that 
other agents offer. For example a planning agent (let us call it A-SHOP), con- 
fronted with a particular planning problem, can find out if there are agents out 
there with the data needed to solve the planning problem; or agents can provide 
A-SHOP with information about relevant legacy data. 

One of the main features of IMPACT is to provide a method (see [3]) for 
agentizing arbitrary legacy code, i.e. to turn such legacy code into an agent. In 
order to do this, we need to abstract from the given code and describe its main 
features. Such an abstraction is given by the set of all datatypes and functions 
the software is managing. We call this a body of software code and denote it by 
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S = {Ts,Ts)- J-s is a set of predefined functions which makes access to the data 
objects managed by the agent available to external processes. 

For example, in many applications a statistics agent is needed. This agent 
keeps track of distances between two given points and the authorized range or ca- 
pacity of certain vehicules. These information can be stored in several databases. 
Another example is the supplier agent. It determines through its databases 
which vehicles are accessible at a given location. 

Definition 1 (State of an Agent, Os{t)). At any given point t in time, the 
state of an agent, denoted Os{t), is the set of all data objects that are currently 
stored in the relations the agent handles — the types of these objects must be in 
the base set of types in Xs- 

In the examples just mentioned, the state of the statistics agent consists of all 
tuples stored in the databases it handles. The state of the supplier agent is the 
set of all tuples describing which vehicles are accessible at a given location. 

We noted that agents can send and receive messages. There is therefore a 
special datastructure, the message box, part of each agent. This message box is 
just one of those types. Thus a state change occurs already when a message is 
received. 



1.1 The Code Call Machinery 

To perform logical reasoning on top of third party data structures (which are 
part of the agent’s state) and code, the agent must have a language within 
which it can reason about the agent state. We therefore introduce the concept 
of a code call atom, which is the basic syntactic object used to access multiple 
heterogeneous data sources. 

Definition 2 (Code Calls (cc)). Suppose S =def {Xs,Ts) is some software 
code, f € J-g is a predefined function with n arguments, and di, . . . , dn are objects 
or variables such that each d^ respects the type requirements of the i ’th argument 
of f . Then, S :/(di, . . . , dn) is a code call. A code call is ground if all the di ’s 
are objects. 

We often identify software code S with the agent that is built on top of it. 
This is because an agent really is uniquely determined by it. 

A code call executes an API function and returns as output a set of objects 
of the appropriate output type. Going back to our two agents introduced above, 
statistics may be able to execute the cc statistics : distance (locFrom, locTo). 
The supplier agent may execute the following cc: 
supplier : cargoPlane(locFrom) . 

What we really need to know is if the result of evaluating such code calls 
is contained in a certain set or not. To do this, we introduce code call atoms. 
These are logical atoms that are layered on top of code calls. They are defined 
through the following inductive definition. 
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A Single agent 




Fig. 2. An Agent in IMPACT 

Definition 3 (Code Call Atoms (in(X, cc))). If cc is a code call, and X is 
either a variable symbol, or an object of the output type ofcc, then in(X, cc) and 
not_in(X, cc) are code call atoms. not_in(X, cc) succeeds if X is not in the set 
of objects returned by the code call cc. 

Code call atoms, when evaluated, return boolean values, and thus may be thought 
of as special types of logical atoms. Intuitively, a code call atom of the form 
in(X, cc) succeeds if X can be set to a pointer to one of the objects in the set of 
objects returned by executing the code call. 

As an example, the code call atom 

in(/22, supplier : car^oP/ane (collegepark)) tells us that the particular plane 
“/22” is available as a cargo plane in College Park. 

Often, the results of evaluating code calls give us back certain values that 
we can compare. Based on such comparisons, certain actions might be fired 
or not. To this end, we need to define code call conditions. Intuitively, a code 
call condition is a conjunction of code call atoms, equalities, and inequalities. 
Equalities, and inequalities can be seen as additional syntax that “links” together 
variables occurring in the atomic code calls. 

Definition 4 (Code Call Conditions (ccc)). 

1. Every code call atom is a code call condition. 

2. If s,t are either variables or objects, then s — t is a code call condition. 

3. If s,t are either integer/real valued objects, or are variables over the inte- 
gers/reals, then s<t,s>t,s>t,s<t are code call conditions. 

4- If XhX 2 «re code call conditions, then xi & X 2 is a code call condition. 

A code call condition satisfying any of the first three criteria above is an atomic 
code call condition. 
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1.2 Agent Programs and Semantics 

We are now coming to the very heart of the definition of an agent: its agent 
program. Such a program consists of rules of the form: 

Opa(ti, . . . , <- 0pi/3i(. 0p„/3„(. . .), 

CCC\ ^ ^ CCC'p ^ 

where a, Pi, ... Pn are actions (the agent can execute), Opi,...,Op„ describe 
the status of the action {obliged, forbidden, waived, doable) and ccci are code call 
conditions to be evaluated in the actual state. 

Thus, Opj are operators that take actions as arguments. They describe the 
status of the arguments they take. Here are some examples of actions: (1) to load 
some cargo from a certain location, (2) to fly a plane from a certain location to 
another location, (3) to unload some cargo from a certain location. The action 
status atom Fload (resp. Do fly) means that the action load is forbidden (resp. 
fly should be done). Actions themselves are terms, only with an operator in front 
of them they become atoms. 

In IMPACT, actions are very much like STRIPS operators: they have pre- 
conditions and add and delete-lists (see appendix) . The difference to STRIPS is 
that these preconditions and lists consist of arbitrary code call conditions, not 
just of logical atoms. 

Figure 2 illustrates that the agent program together with the chosen seman- 
tics SEM and the state of the agent determines the set of all status atoms. 
However, the doable actions among them might be conflicting and therefore we 
have to use the chosen concurrrency notion to finally determine which actions 
can be concurrently executed. The agent then executes these actions and changes 
its state. 

1.3 Evaluability of ccc’s 

Code call conditions provide a simple, but powerful language syntax to access 
heterogeneous data structures and legacy software code. However, in general 
their use in agent programs is not limited. In particular, it is possible that a ccc 
can not be evaluated (and thus the status of actions can not be determined) sim- 
ply because there are uninstantiated variables and thus the underlying functions 
can not be executed. Here is a simple example. 

Example 1 (Sample ccc). The code call condition 

in{F±naiLceRec,rel: select{financeRel, date, " = ", "11/15/99")) & 

FinanceRec. sales > IQK & 

in(C, excel: chart{excelFile, FinanceRec, day)) & 

in(Slide, ppt : include{C, "presentation. ppt")) 

is a complex condition that accesses and merges data across a relational database, 
an Excel file, and a PowerPoint file. It first selects all financial records associated 
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Fig. 3. A code call evaluation graph (cceg) 



with "11/15/99": this is done with the variable FinanceRec in the first line. It 
then filters out those records having sales more than 10 AT (second line). Using 
the remaining records, an Excel chart is created with day of sale on the a;-axis 
and the resulting chart is included in the PowerPoint file " presentation. ppt" 
(fourth line). 

In the above example, it is very important that the first code call be evaluable. If 
financeRel were a variable, then rel : sefect (FinanceRel, date, " = ","11/15/99") 
would not be evaluable, unless there were another condition instantiating this 
variable. 

We have introduced syntactic conditions, similar to safety in classical data- 
bases, to ensure evaluability of ccc’s. It is also quite easy to store ccc’s as eval- 
uation graphs (see Figure 3), thereby making explicit the dependency relation 
between its constituents (see [4]). It is, however, still perfectly possible that the 
execution of a code call does not terminate and we have to add another condition 
to ensure termination (see Subsection 2.3). 

2 Planning 

In this section we show how an HTN planning system, SHOP, can be integrated 
to the IMPACT multi-agent environment. We define the A-SHOP algorithm, an 
agentized adaptation of the original SHOP planning algorithm ([5]) that takes ad- 
vantage of impact’s capabilities for interacting with external agents, perform- 
ing mixed symbolic/numeric computations, and making queries to distributed, 
heterogeneous information sources (such as arbitrary legacy and/or specialized 
data structures or external databases). We also show that A-SHOP is both sound 
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and complete if certain conditions (related to evaluability and termination of the 
underlying code calls) are met. 



2.1 HTN Planning 

Rather than giving a detailed description of the kind of HTN planning used by 
SHOP ([5]), we consider the following example taken from [2]. 

In order to do planning in a given planning domain, SHOP needs to be given 
knowledge about that domain. SHOP’S knowledge base contains operators and 
methods. Each operator is a description of what needs to be done to accomplish 
some primitive task, and each method is a prescription for how to decompose 
some complex task into a totally ordered sequence of subtasks, along with various 
restrictions that must be satisfied in order for the method to be applicable. 

Given the next task to accomplish, SHOP chooses an applicable method, in- 
stantiates it to decompose the task into subtasks, and then chooses and instan- 
tiates other methods to decompose the subtasks even further. If the constraints 
on the subtasks prevent the plan from being feasible, SHOP will backtrack and 
try other methods. 

As an example. Figure 4 shows two methods for the task of traveling from 
one location to another: travelling by air, and travelling by taxi. Travelling by air 
involves the subtasks of purchasing a plane ticket, travelling to the local airport, 
flying to an airport close to our destination, and travelling from there to our 
destination. Travelling by taxi involves the subtasks of calling a taxi, riding in 
it to the final destination, and paying the driver. 

Note that each method’s preconditions are not used to create subgoals (as 
would be done in action-based planning). Rather, they are used to determine 
whether or not the method is applicable: thus in Figure 4, the travel by air 
method is only applicable for long distances, and the travel by taxi method is 
only applicable for short distances. 

Here are some of the complications that can arise during the planning process: 

— The planner may need to recognize and resolve interactions among the sub- 
tasks. For example, in planning how to travel to the airport, one needs to 
make sure one will arrive at the airport in time to catch the plane. To make 
the example in Figure 4 more realistic, such information would need to be 
specifled as part of SHOP’S methods and operators. 



Task 



travel(A:,;^) 



(^^^^avelbyair^ 
long travel-distance ^ 



Methods 

Precon- 
1 .'" ditions 



travel 



by taxi^ 






travel(UMD, MIT) 













get taxi 


ride taxi (x,y) 


pay driver 



buy ticket(BWI, Logan) 
travel(UMD, BWI) 



^ get taxi 

ride taxi(UMD, BWI) 
pay driver 



fly(BWI, Logan) 
travel (Logan, MIT) 



buy ticket(a(x), a(y)) 


travel{x, a(x)) 


fly(a(x), aO)) 


travel(a(y)j;) 



Subtasks 



^ get taxi 

ride taxifLogan, MIT) 



Fig. 4. Travel planning example 






Jurgen Dix 



— In the example in Figure 4, it was always obvious which method to use. But 
in general, more than one method may be applicable to a task. If it is not 
possible to solve the subtasks produced by one method, SHOP will backtrack 
and try another method instead. 



2.2 Agent izat ion of SHOP 

A comparison between IMPACT’S actions and SHOP’S methods shows that IM- 
PACT actions correspond to fully instantiated methods, i.e. no subtasks. While 
shop’s methods and operators are based on STRIPS, the first step is to modify 
the atoms in SHOP’S preconditions and effects, so that SHOP’S preconditions 
will be evaluated by IMPACT’S code call mechanism and the effects will change 
the state of the IMPACT agents. This is a fundamental change in the representa- 
tion of SHOP. In particular, it requires replacing SHOP’S methods and operators 
with agentized methods and operators. These are defined as follows. 

Definition 5 (Agentized Method: (AgentMeth hyt) ). An agentized me- 
thod is an expression of the form (AgentMeth hxt) where h (the method’s 
head( is a compound task, x (the method’s preconditions^ is a code call condition 
and t is a totally ordered list of subtasks, called the task list. 

The primary difference between definition of an agentized method and the 
definition of a method in SHOP is as follows. In SHOP, preconditions were logical 
atoms, and SHOP would infer these preconditions from its current state of the 
world using Horn-clause inference. In contrast, the preconditions in an agentized 
method are IMPACT’S code call conditions rather than logical atoms, and A- 
SHOP (the agentized version of SHOP defined in the next section) does not 
use Horn-clause inference to establish these preconditions but instead simply 
invokes those code calls, which are calls to other agents (which may be Horn- 
clause theorem provers or may instead be something entirely different). 

Definition 6 (Agentized Operator: (AgentOp hxaddXdei) )• An agentized 
operator is an expression of the form (AgentOp hxaddXdeiJi where h (the 
head^ is a primitive task and Xadd o,nd Xdei o,re lists of code calls (called the 
add- and delete-lists/ The set of variables in the tasks in Xadd and Xdei is a 
subset of the set of variables in h. 



The Algorithm 

The A-SHOP algorithm is now an easy adaptation of the original SHOP algo- 
rithm. Unlike SHOP (which would apply an operator by directly inserting and 
deleting atoms from an internally-maintained state of the world), A-SHOP needs 
to reason about how the code calls in an operator will affect the states of other 
agents. One might think the simplest way to do this would be simply to tell these 
agents to execute the code calls and then observe the results, but this would not 
work correctly. Once the planning process has ended successfully, A-SHOP will 
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procedure A-SHOP{t, T>) 

1. if t = mZ then return nil 

2. t := the first task in t; R := the remaining tasks 

3. if t is primitive and a simple plan for t exists then 

4. q := simplePlan{t) 

5. return concatenate{q, A-SHOP{R,'D)) 

6. else if t is non-prim. A there is a reduction of t then 

7. nondeterministically choose a reduction: 

Nondeterministically choose an agentized method, 

(AgentMeth hxt), with p the most general 
unifier of h and t and substitution 6 s.t. 

XfJ-9 is ground and holds in IMPACT’S state O. 

8. return A-SH0P{concatenate{tpL9,R),T>) 

9. else return FAIL 

10. end if 
end A-SHOP 

procedure simplePlan{t) 

11. nondeterministically choose agent, operator 
op = (AgentOp hxadd Xdei) with u the most 
general unifier of h and t s.t. h is ground 

12. monitoring : apply {op u) 

13. return opu 

end A-SHOP 

Fig. 5. A-SHOP, the agentized version of SHOP 

return a plan whose operators can be applied to modify the states of the other 
IMPACT agents — but A-SHOP should not change the states of those agents dur- 
ing its planning process because this would prevent A-SHOP from backtracking 
and trying other operators. 

Thus in Step 12, SHOP does not issue code calls to the other agents directly, 
but instead communicates them to a monitoring agent. The monitoring agent 
keeps track of all operators that are supposed to be applied, without actually 
modifying the states of the other IMPACT agents. When A-SHOP queries for a 
code call cc = S : /(di, . . . ,dn) in y to evaluate a method’s precondition (Step 
7), the monitoring agent examines if cc has been affected by the intended 
modifications of the operators and, if so, it evaluates cc. If cc is not affected by 
application of operations, IMPACT evaluates cc (i.e., by accessing S). The list 
of operators maintained by the monitoring agent is reset everytime a planning 
process begins. The apply function applies the operators and creates copies of 
the state of the world. Depending on the underlying software code, these changes 
might be easily revertible or not. In the latter case, the monitoring agent has to 
keep track of the old state of the world. 
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2.3 Finite Evaluability of ccc’s and Completeness of A-SHOP 

An important question for any planning algorithm is whether all solution plans 
produced by the algorithm are correct (i.e., soundness of the algorithm) and 
whether the algorithm will find solutions for solvable problems (i.e., complete- 
ness of the algorithm) . Soundness and completeness proofs of classical planners 
assume that the preconditions can be evaluated relative to the current state. In 
SHOP, for example, the state is accessed to test whether a method is applicable, 
by examining whether the method’s preconditions are valid in the current state. 
Normally it is easy to guarantee the ability to evaluate preconditions, because 
the states typically are lists of predicates that are locally accessible to the plan- 
ner. However, if these lists of predicates are replaced by code call conditions, 
this is no longer the case. 

We mentioned in Subsection 1.3 the condition of safeness to ensure evalua- 
bility of a code call. We also mentioned that an evaluable cc does not need to 
terminate. Consider the code call 

in(X, math: geq{2b)) & 

in(Y, math: square{X)) & Y < 2000, 

which constitutes all numbers that are less than 2000 and that are squares of an 
integer greater than or equal to 25. 

Clearly, over the integers there are only finitely many ground substitutions 
that cause this code call condition to be true. Furthermore, this code call condi- 
tion is safe. However, its evaluation may never terminate. The reason for this is 
that safety requires that we first compute the set of all integers that are greater 
than 25, leading to an infinite computation. 

Thus in general, we must impose some restrictions on code call conditions to 
ensure that they are finitely evaluable. This is precisely what the condition of 
strongly safeness ([6,3]) does for the code-call conditions. Intuitively, by requiring 
that the code call condition is safe, we are ensuring that it is executable and by 
requiring that it is strongly safe, we are ensuring that it will only return finitely 
many answers. 

Note that the problem of deciding whether an arbitrary code call execution 
terminates is undecidable (and so is the problem of deciding whether a code call 
condition \ holds in O). Therefore we need some input of the agent designer (or 
of the person who is responsible for the legacy code the agent is built upon) . The 
information needed is stored in a finiteness table (see [6,3]). This information is 
used in the purely syntactic notion of strong safeness. It is a compile-time check, 
an extension of the well-known (syntactic) safety condition in databases. 

Lemma 1 (Evaluating Agentized Operators). Let (AgentMeth hx^) an 
agentized method, O a state, and (A.gentOp h' XaddXdei) an agentized operator. 
If the precondition x strongly safe wrt. the variables in h, the problem of 
deciding whether x holds in O can be algorithmically solved. If the add and 
delete-lists Xadd and Xdei are strongly safe wrt. the variables in h' , the problem 
of applying the agentized operator to O can be algorithmically solved. 
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Theorem 1 (Soundness, Completeness). Let O be a state and V be a eollee- 
tion of agentized methods and operators. If all the preconditions in the agentized 
methods and add and delete-lists in the agentized operators are strongly safe 
wrt. the respective variables in the heads, then A-SHOP is correct and complete. 



3 Probabilistic Reasoning 

Up to now our framework of agent programs does not allow us to reason about 
uncertain information. Consider a code call of the form d:/(args). This code 
call returns a set of objects. If an object o is returned by such a code call, then 
this means that o is definitely in the result of evaluating d:/(args). 

However, there are many cases, particularly in applications involving rea- 
soning about knowledge, where a code call may need to return an “uncertain” 
answer. We show in this section that our framework can be easily generalized to 
deal with probabilistic reasoning. 

Example 2 (Surveillance Example). Consider a surveillance application where 
there are hundreds of (identical) surveillance agents, and a geographic agent. 
The data types associated with the surveillance and geographic agent include 
the standard int , bool , real , string, file data types, plus those shown below: 



Surveillance Agent Geographic Agent 



image:record of 


map:| quadtree; 


imageid:file; 


quadtree:record of 


day:date; 


place:string; 


time:int; 


xcoord:int; 


location:string 


ycoord:int; 


imagedb: setof image; 


pop:int 

nw,ne,sw,se:t quadtree 



A third agent may well merge information from these two agents, tracking a 
sequence of surveillance events. 

The surv agent may support a function surv : identify {) which takes as input 
an image and returns as output the set of all identified vehicles in it. It may also 
support a function called surv: turret {) that takes as input, a vehicle id, and 
returns as output, the type of gun-turret it has. Likewise, the geo agent may 
support a function geo : getplnodef) which takes as input a map and the name 
of a place and returns the set of all nodes with that name as the place-field, a 
function geo : getxynode{) which takes as input a map and the coordinates of 
a place and returns the set of all nodes with that coordinate as the node, a 
function called geo : rangef) that takes as input a map, an x,y coordinate pair, 
and a distance r and returns as output, the set of all nodes in the map (quadtree) 
that are within r units of location {x,y). 

In this example, surv: identify {image!) tries to identify all objects in a given 
image — however, it is well-known that image identification is an uncertain task. 
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Some objects may be identified with 100% certainty, while in other cases, it may 
only be possible to say it is either a T-72 tank with 40-50% probability, or a 
T-80 tank with 50-60% probability. 

Image processing algorithms for vehicle surveillance applications that return 
probabilistic identifications are readily available (e.g., see [7] and [8]). 

3.1 Probabilistic Code Calls 

The first step to extend our framework is to introduce the notion of a probabilistic 
code call. Its main ingredient is a random variable. 

Definition 7 (Random Variable of Type t). A random variable of type t 
is a finite set RV of objects of type t, together with a probability distribution p 
that assigns real numbers in the unit interval [0, 1] to members o/RV such that 
-£'ogRvp(o) < 1. 

It is important to note that in classical probability theory [9] , random variables 
satisfy the stronger requirement that VogRvp(o) = 1. However, in many real- 
life situations, a probability distribution may have missing pieces, which explains 
why we have chosen a weaker definition. 

Definitions (Probabilistic Code Call a:jcv f{di, ■ ■ ■ ,dn))- Suppose the 
code call a : /(di, . . . , dn) has output type r. The probabilistic code call associ- 
ated with a : /(di, . . . , dn), denoted a :Rv/(di, • ■ ■ , dn), returns a set of random 
variables of type t when executed. 

Example 3. Consider the code call surv : identi/?/( image 1). This code call may 
return the following two random variables. 

({t72,t80},{(t72,0.5),(t80,0.4)}) and ({t60, t84}, {(t60, 0.3), (t84, 0.7)}) 

This says that the image processing algorithm has identified two objects in im- 
agel. The first object is either a T-72 or a T-80 tank with 50% and 40% proba- 
bility, respectively, while the second object is either a T-60 or a T-84 tank with 
30% and 70% probability respectively. 

Probabilistic code calls and code call conditions look exactly like ordinary code 
calls and code call conditions — however, as a probabilistic code call returns a 
set of random variables, probabilistic code call atoms are true or false with some 
probability. 

We are now ready to generalize the notion of a state of an agent to its 
probabilistic counterpart. 

Definition 9 (Probabilistic State of an Agent). The probabilistic state of 
an agent a at any given point t in time, denoted 0^(t), consists of the set of all 
instantiated data objects and random variables of types contained in T^. 
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3.2 Conjunction Strategy and Probabilistic Agent Programs 

The next step is to define the satisfaction relation of probabilistic code call 
conditions. This is problematic as the following example illustrates. 

Example 4 - Consider the probabilistic code call condition 

in(X, surv :rv irfenti/y (image 1)) & in(af , surv :rv turret{X)). 

This code call condition attempts to find all vehicles in “imagel” with a gun 
turret of type al. Let us suppose that the first code call returns just one random 
variable specifying that imagel contains one vehicle which is either a T-72 (prob- 
ability 50%) or a T-80 tank (probability 40%). When this random variable (X) is 
passed to the second code call, it returns one random variable with two values — 
al with probability 30% and a2 with probability 65%. What is the probability 
that the code call condition above is satisfied by a particular assignment to X? 
The answer to this question depends very much upon the knowledge we have (if 
any) about the dependencies between the identification of a tank as a T-72 or 
a T-80, and the type of gun turret on these. For instance, if we know that all 
T-72’s have a2 type turrets, then the probability of the conjunct being true when 
X is a T-72 tank is 0. On the other hand, it may be that the turret identification 
and the vehicle identification are independent for T-80s — hence, when X is set 
to T-80, the probability of the conjunct being true is 0.4 x 0.3 = 0.12. 

Therefore the probability that a conjunction is true depends not only on the 
probabilities of the individual conjuncts, but also on the dependencies between 
the events denoted by these conjuncts. 

We have solved this problem by introducing the notion of a probabilistic 
conjunction strategy ® to capture these different ways of computing probabilities 
via an abstract definition. We are also using annotations to represent probability 
intervals. For instance, [0, 0.4], [0.7, 0.9], [0.1, |], [|, |] are all annotations. The 
annotation [0.1, |] denotes an interval only when a value in [0, 1] is assigned to 
the variable V. 

Definition 10 (Annotated Code Call Condition y : ([aii, ai 2 ], <8>)). If x 

is a probabilistic code call condition, iS) is a conjunction strategy, and [aii,ai 2 ] 
is an annotation, then y : ([aii, ai 2 ], ®) is an annotated code call condition, 
y : ([ail, ai 2 ], is ground if there are no variables in either y or in [aii,ai 2 ]. 

Intuitively, the ground annotated code call condition y : ([aii, ai 2 ], (8>) says that 
the probability of y being true (under conjunction strategy ®) lies in the interval 
[aii,ai 2 ]. For example, when X is ground, 

in(X, surv :rv identify { imsigel )) & in(al, surv :rv turret(X)) : ([0.3, 0.5], ®ig) 

is true if and only if the probability that X is identified by the surv agent and 
that the turret is identified as being of type al lies between 30 and 50% assuming 
that nothing is known about the dependencies between turret identifications and 
identifications of objects by surv. 

We are now ready to define the concept of a probabilistic agent program. 
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Definition 11 (Probabilistic Agent Programs VV). Suppose F is an an- 
notated code call condition, and A, Li, . . . , Ln are status atoms. Then 

FkLik ... kLn (1) 

is a probabilistic agent rule. For such a rule r, we use B^^(r) to denote the 
positive status atoms in {Li, . . . ,Ln}, and B~,.(r) to denote the set of negative 
status literals in {Li, . . . , L„}. 

A probabilistic agent program (pap for short) is a finite set of probabilistic 
agent rules. 

Consider an intelligent sensor agent that is performing surveillance tasks. 
The following rules specify a small pap that such an agent might use. 

Do sendjwarn{X) ^ in(F, surv :/iZe(imagedb)) & 
in(X, surv :rv identify{F)) & 
in(al, surv :rv tMrret(X))) : ([0.7, 1.0], (8>ig) 
sendjwarn{X) . 

Fsend_warn{X) ^ in(F, surv :/iZe(imagedb)) & 
in(X, surv :rv identify{F)) & 
in(L, geo :rv getplnode(X..loca.t±on)) & 
in(L, geo :rv range{100, 100, 20)). 

This agent operates according to two very simple rules. The first rule says that 
it sends a warning whenever it identifies an enemy vehicle as having a gun turret 
of type al with over 70% probability, as long as sending such a warning is not 
forbidden. The second rule says that sending a warning is forbidden if the enemy 
vehicle is within 20 units of distance from location (100, 100). 

Defining the semantics for this kind of programs is out of scope of this paper 
and we refer to [1]. 

4 Serving Requests more Efficiently 

With the increase in agent-based applications, there are now agent systems that 
support concurrent client accesses. The ability to process large volumes of si- 
multaneous requests is critical in many such applications. In such a setting, the 
traditional approach of serving these requests one at a time via queues (e.g. FIFO 
queues, priority queues) is insufficient. In this section we review the approach 
of [4] . The overall idea is that for a given set of requests one needs to 

1. identify commonalities among them. This information can be used to sim- 
plify the set and merge some of the requests together. 

2. compute a single global execution plan that simultaneously optimizes the 
total expected cost of this set of code call conditions. 

Instead of sending many individual requests one after another, sending one large 
merged request (the answer from which the answers to the original requests can 
be deduced) can already save a lot of network time. 
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4.1 Invariants 

How can we detect commonalities? Obviously, we need input from the agent de- 
veloper. In our framework, an agent developer specifies several parameters. One 
of these parameters must include some domain-specific information, explicitly 
laying out what inclusion and equality relations are known to hold of code calls. 
Such information is specified via invariants. An important ingredient for their 
definition are invariant expressions. 

Definition 12 (Invariant Expression). 

— Every evaluable code call condition is an invariant expression. We call such 
expressions atomic. 

~ If iei and ie2 are invariant expressions, then (/ei U ie2) and (/ei n 162) are 
invariant expressions. (We will often omit the parentheses.) 

Example 5 . Two examples of invariant expressions are: 

in(StudentRec, rel : select{courseRel, exam, " = ", midterml)) & 
in(C, excel : chart {excelEile, StudentRec, grade)) 

in(X, spatial : horizontal(j, B, U)) U (in(Y, spatial : horizontal {T' , B', U')) U 
in(Z, spatial : horizontal {T' , B', U))). 

What is the meaning, i.e. the denotation of such expressions? The first in- 
variant represents the set of all objects c such that 

in(StudentRec, rel : select{courseRel, exam, " = ", midterml)) & 
in(c, excel: chart{excelEile, StudentRec, grade)) 

holds: we are looking for instantiations of C. Note that under this viewpoint, the 
intermediate variable StudentRec which is needed in order to instantiate C to 
an object c does not matter. There might just as well be situations where we are 
interested in pairs (c, studentrec) instead of just c. 

Definition 13 (Invariant Condition (ic)). An invariant condition atom is a 
statement of the form ti Op t2 where Op G {<,>,<,>,=} and each ofti, t2 is 
either a variable or a constant. An invariant condition (IC) is defined inductively 
as follows: 

1 . Every invariant condition atom is an ic. 

2 . If Cl and C2 are ic’s, then Ci A C2 and Ci V C2 are ic’s. 

Definition 14 (Invariant inv, INV). An invariant, denoted by inv, is a state- 
ment of the form 

ic=^ iei 3 ? ie2 ( 2 ) 

where 

1. ic is an invariant condition, all variables occuring in ic are among 
varbase{iei) U varbase{ie2) . 
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2. ^ G {=, C}, and 

3. iei,ie 2 are invariant expressions. 

If iei and ie 2 both contain solely atomic code call conditions, then we say that inv 
is a simple invariant. If ic is a conjunction of invariant condition atoms, then 
we say that inv is an ordinary invariant. The set of all invariants is denoted by 

INV. 

The invariant, 

Rel = ReT A Attr = Attr' A Op Op' = "<" A Val < VaT 

in(X, rel : select(Rel, Attr, Op, Val)) C in(Y, rel : select{Kel' , Attr', Op', Val')) 

says that the code call condition in(X, rel : sefect (Rel, Attr, Op, Val)) can be eval- 
uated by using the results of the ccc in(Y, rel : seZect (Rel', Attr', Op', Val')) if 
the above conditions are satisfied. Note that this expresses semantic informa- 
tion that is not available on the syntactic level: the operator "<" is related to 
the relation symbol " < " . 

4.2 Merging Requests 

Let us suppose now that we have a set T of invariants, and a set S of data 
structures that are manipulated by the agent. How exactly should a set C of 
code call conditions be merged together? And what needs to be done to support 
this? Our architecture contains two parts: 

(i) a development time phase stating what the agent developer must specify 
when building her agent, and what algorithms are used to operate on that 
specification, and 

(ii) a deployment time phase which specifies how the above development-time 
specifications are used when the agent is in fact running autonomously. 



Development Time Phase. When the agent developer builds her agent, the 
following things need to be done. 

1. First, the agent developer specifies a set X of invariants. 

2. Suppose C is a set of CCCs to be evaluated by the agent. Each code call 
condition y e C is represented via an evaluable cceg (see Figure 3 in Sub- 
section 1.3). Let INS{C) represent the set of all nodes in ccegs of xs in C: 

INS{C) = {vi I 3x € C s.t. Vi is in yf s cceg}. 

This can be done by a topological sort of the cceg for each x G C. 

3. Additional invariants can be derived from the initial set X of invariants. This 
requires the ability to check whether a set X of invariants implies an inclusion 
relationship between two invariant expressions. Although we have defined a 
formally precise notion of a set of invariants implying other invariants we 



A Computational Logic Approach to Heterogenous Agent Systems 17 

will provide a generic test called ChkJEmp for implication checking between 
invariants. There are various instances of ChkJmp that are sound but not 
complete, thereby allowing us to specify various parameters and heuristics. 
Given an arbitrary (but fixed) ChkJmp test, we will provide an algorithm 
called Compute-Derived-Invariants that calculates the set of derivable 
invariants from the initial set T of invariants and needs to be executed just 
once. 



Deployment Time Phase. Once the agent has been “developed” and deployed 
and is running, it will need to continuously determine how to merge a set C of 
code call conditions. This will be done as follows: 

1. The system identifies three types of relationships between nodes in INS{C). 
Identical ccc’s: First, we’d like to identify nodes Xi,X 2 G INS{C) which 
are “equivalent” to one another, i.e. xi = X 2 is a logical consequence 
of the set of invariants X. This requires a definition of equivalence of 
two code call conditions w.r.t. a set of invariants. This strategy is useful 
because we can replace the two nodes xi ; X 2 by a single node. This avoids 
redundant computation of both xi and X 2 - 
Implied ccc’s: Second, we’d like to identify nodes Xi)X 2 G INS{C) which 
are not equivalent in the above sense, but such that either xi ^ X 2 or 
X 2 C xi hold, but not both. Suppose xi ^ X 2 - Then we can compute 
X 2 first, and then compute xi from the answer returned by computing 
X 2 - This way of computing Xi)X 2 niay be faster than computing them 
separately. 

Overlapping ccc’s: Third, we’d like to identify nodes Xi:X 2 G INS{C) for 
which the preceding two conditions do not hold, but xi & X 2 is consistent 
with INS{C). In this case, we might be able to compute the answer to 
Xi V X 2 - From the answer to this, we may compute the answer to xi 
and the answer to X 2 - This way of computing Xii X 2 may be faster than 
computing them separately. 

We will provide an algorithm, namely Improved-CSI, which will use the set 
of derived invariants returned by the Compute-Derived-Invariants algo- 
rithm above, to detect commonalities (equivalent, implied and overlapping 
code call conditions) among members of C. 

Example 6. The two code call conditions in(X, spatial : verticalij, L, R)) and 
in(Y, spatial : vertical{l' , L', R')) are equivalent to one another if their argu- 
ments are unifiable. The results of evaluating the code call condition 

in(Z, spatial : range{7, 40, 50, 25)) 
is a subset of the results of evaluating the code call condition 



in(W, spatial: range{T' , 40, 50, 50)) 
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if T = T'. Note that spatial : range{T, X, Y, Z) returns all points in T that are 
Z units away from the point (X,Y). In this case, we can compute the results 
of the former code call condition by executing a selection on the results of 
the latter rather than executing the former from scratch. Finally, consider 
the following two code call conditions: 

in(X, spatial: horizontal {map, 100,200)), 
in(Y, spatial: horizontal {map, 150, 250)). 

Here spatial : horizontal {map, a, b) returns all points (X, Y) in map such that 
a <Y < b. Obviously, the results of neither of these two code call conditions 
are subset of the results of the other. However, the results of these two code 
call conditions overlap with one another. In this case, we can execute the 
code call condition in(Z, spatial: horizontal {map, 100,250)). Then, we can 
compute the results of the two code call conditions by executing selections 
on the results of this code call condition. 

2. We will then provide two procedures to merge sets of code call conditions, 
BFMerge and DFMerge, that take as input, (i) the set C and (ii) the out- 
put of the Improved-CSI algorithm above, and (Hi) a cost model for agent 
code call condition evaluations. Both these algorithms are parameterized by 
heuristics and we propose three alternative heuristics. Then we evaluate our 
six implementations (3 heuristics times 2 algorithms) and also compare it 
with an A* based approach. 

For an implementation, we implemented both these algorithms on top of the 
IMPACT agent development platform, and on top of a {non-IMPACT) geo- 
graphic database agent. 

4.3 Results 

Development Phase. The definition of a sound and complete instance of 
ChkJmp is based on the definition of a certain monotone fixpoint operator, 
the least fixedpoint of which constitutes the set of implied invariants ([4]). Com- 
pleteness is proved by reducing the problem to the completeness of a particular 
first-order calculus and using a Henkin-like construction. 

Proposition 1 (co-NP Completeness of Checking Implication). 

Suppose all datatypes have a finite domain (i.e. each datatype has only finitely 
many values of that datatype). Then the problem of checking whether an arbi- 
trary invariant expression iei implies another invariant expression ie 2 is co-NP 
complete. The same holds for the problem of checking whether an invariant is a 
tautology. 

We have therefore studied the tradeoffs involved in using sound, but perhaps 
incomplete implementations of implication checking. 

There are clearly many ways of implementing the algorithm ChkJlmp that 
are sound, but not complete. We considered a generic algorithm to implement 
ChkJmp, where the complexity can be controlled by two input parameters — an 
axiomatic inference system and a threshold. 
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Deployment Phase. We developed two algorithms (and various accompanying 
heuristics) which allow an agent to automatically rewrite requests so as to avoid 
redundant work — these algorithms take invariants associated with the agent into 
account. Our algorithms are independent of any specific agent framework. For 
an implementation, we implemented both these algorithms on top of the IM- 
PACT agent development platform, and on top of a (non-IMPACT) geographic 
database agent. Based on these implementations, we conducted experiments and 
show that our algorithms are considerably more efficient than methods based on 
the well known algorithm in [10] for merging multiple relational database only 
queries using the A* algorithm. Our experiments show that although the A* 
algorithm finds better global results, the cost of obtaining those results is so 
prohibitively high that the A* is often infeasible to use in practice. 

Figure 6 shows, that the execution time for determining overlapping code 
calls still is below one second for a set of 20 ccc’s. Similar times are obtained 
for equivalent and implied ccc’s. We also noted that there are often more ccc’s 
falling in the implied or overlapping categories, than in the equivalence category. 
As the methods based on the A* algorithm only searches for the latter category, 
our optimizations pay off. 

Although the A* algorithm finds better global results, the cost of obtaining 
those results is so prohibitively high that the A* is often infeasible to use in 
practice. We have also shown that our merging algorithms (1) can handle more 
than twiee as many simultaneous code call conditions as the A* algorithm and (2) 
run 100 to 6300 times faster than the A* algorithm and (3) produce execution 
plans the cost of which is at most 10% more than the plans generated by the 
A* algorithm. 



Execution Time of the Algortihms with Type 3 CCC Sets 




Fig. 6. Execution Time of Merge Algorithms with overlapping ccc Sets 
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5 Conclusion 

We have illustrated three powerful extensions to the basic IMPACT frame- 
work: incorporating planning, incorporating uncertain reasoning and optimizing 
queries sent over a network. While the first extension required an agentization 
procedure to incorporate an efficient HTN planner into IMPACT, the second ex- 
tension extended the notion of a code call to one dealing with random variables 
and required heavy use of annotated logic programming. The third extension 
required fixpoint techniques and automated reasoning mechanisms (to prove the 
completeness result). 

The semantics of the basic framework as well as of the extensions described 
are based on the notion of an agent program and thus are very much related 
to nonmonotonic formalisms like the stable and wellfounded semantics. We can 
conclude that any formal approach to heterogenous agent systems can benefit a 
lot from Computational Logic, to which all the above techniques belong. 
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Abstract. Lixto is a system and method for the visual and interactive 
generation of wrappers for Web pages under the supervision of a human 
developer, for automatically extracting information from Web pages us- 
ing such wrappers, and for translating the extracted content into XML. 
This paper describes some advanced features of Lixto, such as disjunctive 
pattern definitions, specialization rules, and Lixto’s capability of collect- 
ing and aggregating information from several linked Web pages. 



1 Introduction and Motivation 

Extracting relevant information automatically from HTML Web pages of chang- 
ing content, and converting the extracted information to a structured repre- 
sentation is an important problem, to which a lot of research has been ded- 
icated [3,7,8,10,11,13,14]. XML was designed to enrich the semantics of Web 
information [1,6]. Even if in some respects XML may not yet fulfill this goal per- 
fectly, XML appears to be the right representation format for the information 
extracted from HTML. Programs that perform such extraction and translation 
tasks are referred to as wrappers. Wrappers can be hand-coded, e.g. in spe- 
cialized languages such as Jedi [9] or Florid [12], or they can be produced via 
wrapper generators. Wrapper generators are software tools that generate wrap- 
pers via induction (such as e.g. [2,10,13]) or that semi-automatically support 
the generation of wrappers via an interactive process supervised by a human 
designer ([11,14]). Wrapper generators support the task of reverse engineering, 
as the goal of a wrapper is to reverse the processing of dynamic Web sites that 
generate HTML starting from an internal structured representation (such as a 
relational database) . 

In a recent paper [5] we introduced Lixto, a new method and system for 
visually generating HTML/XML wrappers under the supervision of a human 
designer. Lixto allows a wrapper designer to interactively and visually define 
information extraction patterns on the base of visualized sample Web pages. 
These extraction patterns are collected into a hierarchical knowledge base that 

* All new methods and algorithms of the Lixto system are covered by a pending patent. 

Future developments of Lixto will be reported at www.lixto.com. 
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constitutes a declarative wrapper program. The extraction knowledge is inter- 
nally represented in a datalog like special-purpose logic programming language, 
called Elog. However, a user of Lixto is not concerned with the syntax of Elog 
and does not need to learn this language as she constructs an Elog wrapper 
program by purely visual and interactive primitives without ever seeing the re- 
sulting Elog program. Wrapper programs in Elog can be directly executed over 
input Web sites by an extractor module that interprets the Elog rules taking 
care of the evaluation of special built-in predicates. Lixto also allows a designer 
to define XML translation rules that specify how extracted content should be 
translated into XML, a so-called XML translation scheme. An XML translation 
scheme together with extraction pattern definitions (the Elog program) in addi- 
tion enables the system to construct a Document Type Definition (DTD) which 
describes the characteristics of the output XML documents. 

The advantages of the Lixto wrapper generator over competing approaches 
are mainly the following. (1) Very high expressive power, i.e., an unprecedented 
capability of defining sophisticated extraction patterns. (2) Excellent visual sup- 
port: The wrapper designer’s sole view of an example HTML document is the 
browser-displayed standard image of the document (no annotations, overlays, 
HTML-sources or DOM trees) and the wrapper designer uses directly this dis- 
play for marking extraction patterns. (3) Good leamability, because no extraction 
language needs to be learned and neither HTML nor XML knowledge is neces- 
sary. (4-) Sample parsimony, which means that very few sample pages (in most 
cases a single one) are needed in order to define robust wrappers for large classes 
of Web pages. A (5) simple and smooth XML translation mechanism that gives 
a designer several options for formatting or modifying the XML output. 

Basic features of Lixto are described in [4,5], where also a comparison to 
related research is given. The main goal of the present paper is to introduce and 
illustrate some of the more advanced features of the Elog language. All the pre- 
sented advanced features can be visually created by using Lixto without knowing 
Elog. Details of the visual interface and the way of creating patterns can be found 
in [4] and [5] , where a precise description of the pattern generation algorithm is 
given. There, these details are discussed for a restricted environment w.r.t. some 
advanced concepts discussed in this paper, but a quite similar approach can be 
used for these advanced features. The present paper is self-contained at the level 
of general description, but not at the level of details. For the latter, we refer 
to [5]. 

Among the advanced features we discuss here are disjunctive wrapping, i.e., 
defining one pattern through several alternative definitions; pattern specializa- 
tion, i.e., defining a new pattern by restricting another pattern; interactively 
defining new document patterns, which are patterns corresponding to entire doc- 
uments that are identified via extracted URLs; Web crawling, which, in this con- 
text, means that a pattern hierarchy is built that aggregates information from 
various Web pages by starting at a given input page and automatically following 
URLs to other pages; and recursive wrapping which means that recursive pat- 
tern structures (akin to recursive data types) can be constructed that allow the 
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system to crawl to an indefinite number of Web pages and extract information 
from all these pages. We will also discuss some interesting nonmonotonic issues 
such as pattern minimization principles and the semantics of range restrictions. 
Moreover, this paper introduces pattern graphs for describing the structure of the 
pattern hierarchy interactively defined by a designer (see Figures 3,4,6, and 7). 
Note that pattern graphs for simple extraction tasks are trees, which means that 
there is a strict pattern hierarchy. When disjunctive pattern definitions are used, 
then the corresponding pattern graphs are dags, while with recursive wrapping 
they are cyclic graphs. 

The paper is structured as follows. In the next two sections we give an 
overview of Lixto and a description of the basic features of the Elog language. 
Section 4 gives a closer look on some features. In Section 5 we illustrate the 
power of disjunctive pattern descriptions, whereas in Section 6 some light is 
shed on Elog’s aspects concerning link crawling and recursion. These sections 
introduce advanced features of the internal language of Lixto both with an ab- 
stract description and examples from the commercial domain. Section 7 discusses 
various nonmonotonic aspects of Lixto such as minimization, range conditions, 
and further recursive aspects introduced by pattern references. 



2 Pattern Generation with Lixto 

Architecture. The Lixto prototype consists of two main blocks: The Wrapper 
Generator and the Program Evaluator. One module of the wrapper generator, 
the Interactive Pattern Builder., allows a wrapper designer to create and to store 
a wrapper in form of an extraction program (a program in the language Elog). 
Moreover, the wrapper generator contains the XML Translation Builder that al- 
lows a designer to specify how extracted data should be translated into XML for- 
mat and to store such a specification in form of an XML translation scheme. The 
program evaluator automatically executes an extraction program (performed by 
the Extractor module) and a corresponding XML translation scheme (performed 
by the XML translator module) over Web pages by extracting data from them 
and translating the extracted data into XML format. (For details see [5].) 

Extraction Patterns. A wrapper is constructed by formalizing, collecting, 
and storing the knowledge about desired extraction patterns. Extraction pat- 
terns describe single data items or chunks of coherent data to be extracted from 
Web pages by their locations and by their characteristic internal or contextual 
properties. Extraction patterns are generated and refined interactively and semi- 
automatically with help of a human wrapper designer. They are constructed in 
a hierarchical fashion on sample pages by marking relevant items or regions via 
mouse clicks or similar actions, by menu selections, and/or by simple textual 
inputs to the user interface. A wrapper, in our approach, is thus a knowledge 
base consisting of a set of extraction patterns. 

While patterns are descriptions of data to be extracted, pattern instances 
are concrete data elements on Web pages that match such descriptions, and 
hence are extracted. Lixto distinguishes different types of patterns: Tree, string, 



24 



Robert Baumgartner et al. 



and document patterns. Tree patterns serve to extract parts of documents cor- 
responding to tree regions, i.e., to subtrees of their parse tree. String patterns 
serve to extract textual strings from visible and invisible parts of a document (an 
invisible part could be, e.g., an attribute value such as the name of an image). 
Document patterns are used for navigating to further Web pages. 

Logical Organization of Patterns. The logical organization of an extraction 
pattern is as follows: each extraction pattern has a name and contains one or 
more so-called filters. Each filter provides an alternative definition of data to be 
extracted and to be associated with the pattern. The set of filters of a pattern is 
interpreted disjunctively (i.e., connected by logical ORs). Each filter is associated 
to a parent pattern from which it extracts the desired information. Tree (string) 
patterns are specified via tree (string) filters. 

A tree filter contains a representation of a generalized parse tree path that 
matches a set of items on a Web page, and contains a set of conditions that these 
items must satisfy. All the conditions of a filter are interpreted conjunctively, i.e., 
an element of a Web page satisfies a filter if and only if it matches its generalized 
tree path and satisfies all the conditions of the filter. Similarly, a string filter 
specifies the characteristics of the text to be extracted (using a formal language), 
and possibly further conditions. 

Lixto offers a wrapper designer the possibility to express various types of 
conditions restricting the intended pattern instances of a filter. The main types 
of conditions are inherent (internal) conditions, contextual (external) conditions, 
and range conditions. In addition to these three basic types of conditions, Lixto 
allows a designer to express auxiliary conditions like pattern reference conditions, 
concept conditions and comparison conditions. They are discussed as atoms of 
the Elog language in more detail in Section 3. 

Visual Pattern Generation. Extraction patterns are defined by the designer in 
a hierarchical manner. A pattern that describes an entire document is referred to 
as a document pattern. In particular, the document pattern corresponding to the 
starting Web page, the so-called “home document pattern” , is available as a pre- 
existing pattern. Other patterns are defined interactively. Filters or patterns are 
usually defined in the context of other patterns (so-called parent patterns). For 
example, a pattern <ncune> may be defined first, and then patterns <f irstname> 
and <f amilyname>, etc., may be defined in the context of the source pattern 
<name>. For the majority of common extraction tasks, defining flat patterns 
or a strict hierarchy of patterns will in practice be sufficient. However, Lixto 
does not limit the pattern definition to be strictly hierarchical (i.e. tree- like). 
Moreover, pattern definitions are allowed to be recursive (similar to recursive 
type definitions in programming languages). While patterns are not required 
to form a strict hierarchy, pattern instances do always form one and can be 
arranged as a tree (or forest, in case they stem from different documents, which 
can be the case in recursive programs as explained in Section 6) . 

The visual and interactive pattern definition method allows a wrapper de- 
signer to define an extraction program and an associated XML translation 
scheme without any programming efforts. The Lixto Interactive Pattern Builder 
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allows a wrapper designer to define filters and patterns with the help of one or 
more characteristic example pages, and to modify and store patterns. At various 
intermediate steps, the designer may test a partially or fully constructed filter 
or pattern, both on the example pages used to construct the pattern as well as 
on any other Web page. The result of such a test is a set of pattern instances, 
which is displayed by a browser as a set of highlighted items. 

The filter description procedure for tree-filters can be described as follows: 
The designer marks an initial element on an example Web page (for example, 
a table) . The system associates with this element a generalized tree path of the 
parse tree that (possibly) corresponds to several similar items (for example, sev- 
eral tables). The designer then tests the filter for the first time. If more than just 
the intended data items are extracted (and thus highlighted) as a result of the 
test, then the designer adds restrictive conditions to the filter and tests the filter 
again. This process is repeated as long as imdesired data items are extracted. At 
the end of the process, the filter extracts only desired items. A similar procedure 
is used for designing string filters. However, for creating a string rule usually no 
example is selected, but some characterizations are visually composed, e.g. by 
relying on concept conditions. A pattern is designed by initially asserting one 
filter for the pattern, and, in case this is not sufficient (because testing shows 
that not all intended extraction items on the test pages are covered), by asserting 
successively more filters for the pattern under construction, until each intended 
extraction item is covered by at least one filter associated to that pattern. 

Observe that the methods of filter construction and pattern construction 
correspond to methods of definition-narrowing and definition-broadening that 
match the conjunctive and disjunctive nature of filters and patterns, respectively. 
It is the responsibility of the wrapper designer to perform sufficient testing, and - 
if required by the particular application-test filters and patterns also on Web 
pages different from the initially chosen example pages. Moreover, it is up to the 
wrapper designer to choose suitable conditions that will work not only on the 
test pages, but also on all other target Web pages. 

The visual and interactive support for pattern building offered by Lixto also 
includes specific support for the hierarchical organization of patterns and filters. 
A wrapper definition process according to Lixto (and consequently, a Lixto wrap- 
per) is not limited to a single sample Web document, and not even to sample Web 
pages of the same type or structure. During wrapper definition, a designer may 
move to other sample Web pages (i.e., load them into the browser), continuing 
the wrapper definition there. 

XML Translation. The XML Translation Builder which constitutes another 
interactive module of the wrapper generator, is responsible for supporting a 
wrapper designer during the generation of the XML translation scheme. By de- 
fault, pattern names are used as output XML tags and the hierarchy of extracted 
pattern instances determines the structure of the output XML document. Thus, 
in case no specific action is taken by the designer, the pattern instances are 
translated into XML in a standard way without any need of further interac- 
tion. However, Lixto also offers the wrapper designer the option to modify the 
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standard XML translation in the various ways: Renaming patterns, suppressing 
auxiliary patterns, writing some HTML attributes, and deciding whether in- 
stances of document patterns are all treated at the same level, or hierarchically 
ordered as defined by the extraction process. Moreover, to define a DTD based 
on an output, a wrapper designer can assign a multiplicity to each pattern, i.e. if 
one or several instances are required/allowed to occur within a parent pattern. 

These desired modalities of the XML translation are determined during the 
wrapper design process by a very simple and user-friendly graphical interface and 
are stored in the form of an XML translation scheme that encodes the mapping 
between extraction patterns and the XML output in a suitable form. 

3 An Overview of the Elog Extraction Langnage 

As mentioned in the previous sections, patterns are internally represented us- 
ing the declarative extraction language Elog. The Elog language is specifically 
designed for hierarchical and modular data extraction and it is ideally suited 
for representing and successively incrementing the knowledge about extraction 
patterns. It uses a datalog-like syntax and semantics, enriched with several pre- 
defined predicates related to information extraction. An Elog program is a col- 
lection of rules containing special extraction atoms in their bodies. 

We illustrate the main characteristics of Elog using an example program 
which can be applied to eBay pages, e.g. to the sample page in Figure 1. Figure 2 
shows an Elog program applied to a category search result page of eBay. In 
the following examples, we additionally use a pattern graph to represent a Lixto 
wrapper. A pattern graph is a directed graph whose nodes represent patterns and 
an arc from a pattern p^ to a pattern p\ specifies that there is a filter defining p 2 
that extracts information from instances of p\. Moreover, document, tree, and 
string patterns are represented using different shapes. Finally, it is possible to 
represent also information about the XML translation scheme using this graph. 
In particular, we specify that a pattern is translated to an XML element by 
writing a text “pattern name/elementname” into the pattern node. If the element 
name is missing, then the pattern name is used as default translation. The set 
of included attributes are embedded in a list, e.g. “[url, font]”, and patterns 
that are not translated are drawn with dashed lines. It is possible to specify a 
minimum and maximum multiplicity on the arcs ( “[min, max]”, to specify the 
information used in the construction of the DTD (see the end of this section). 
When no multiplicity of a pattern is explicitly indicated in the pattern graph, 
then a minimum and maximum multiplicity of 1 for that pattern are assumed. 
The pattern graph of the program in Figure 2 is shown in Figure 3. In this case, 
as all filters of one pattern point to the same parent, it forms a tree. 

An extraction program consists of a set of patterns. In Elog, a pattern p is 
represented by a set of rules having all the same head atom of the form p{S, X). 
Elog rules define elements to be extracted from Web pages. Each rule corresponds 
to one filter. The head of an Elog rule r is always of the form p{S,X) where p 
is a pattern name, S' is a variable which is bound in the body of the rule to the 



Declarative Information Extraction, Web Crawling, and Recursive Wrapping 



27 



£4e (4* !^c« ifi CtnawBMv tl<* 




Fig. 1. Sample eBay page 



parent-pattern instances of the filter corresponding to r, and X is the target 
variable which, at extraction time, is bound to some target pattern instance to 
be extracted (either a tree region or a textual string) . The body of an Elog rule 
contains atoms that jointly restrict the intended pattern instances. For example, 
an Elog rule corresponding to a tree filter contains in its body an atom expressing 
that the desired pattern instances should match a certain tree path and another 
atom that binds the variable S' to a parent-pattern instance. 

In the example program, the pattern <tableseq> is used to extract a se- 
quence of tables which represent records. Observe that in each search result 
page of eBay, a record is a whole table consisting of a single table row. This 
sequence of tables is required to be preceded by a table which contains the word 
“Current”, and to be followed by an image representing a horizontal line. 
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ebaydocument(S, X) 


^ getDocumeiit(S = $1,X) 


tableseq(S, X) 


<— ebaydocument(_, S), 

subsq(S, (*.body. * .center, []), (.table, []), (.table, []), X), 
bef ore(S, X, (*.tr, [(elementtext, Current, substr)]), 0, 0, _), 

after(S, X, (★.img, [(src, spacer.gif, substr)]), 0, 0, _, _) 


record(S, X) 


tableseq(_, S), subelem(S, .table, X) 


itemdes(S, X) 


<— record(_, S), subelem(S, (*.td. * .content, [(href, , substr)], X) 


price(S, X) 


^ record(_, S), 

subelem(S, (*.td, [(elementtext, \var[Y].*, regvar)]), X), 
isCurrency(Y) 


bids(S,X) 


^ record(_, S), subelem(S, *.td, X), bef ore(S, X, .td, 0, 30, Y, _) 
price(_, Y) 


date(S,X) 


^ record(_, S), subelem(S, *.td, X), notafter(S, X, .td, 100) 


currency(S, X) 


^ price(_, S), subtext(S, \var[Y], X), isCurrency(Y) 


pricewc(S, X) 


^ price(_, S), subtext(S, [0 — 9]^\.]0 — 9]^, X) 



Fig. 2. Elog Extraction Program for a a single eBay page 



The rule with head predicate record{S, X) in Figure 2 identifies all tables 
within a specific area, which is the instance of tableseq{-, S) . For each ground 
atom tableseq{p, s) (where p and s are tree regions), this rule derives atoms 
of the form record(s,x) for each table x contained in s. Thus the variable S 
identifies the context of the extraction, in this case, these are the instantiations 
of tableseq. Optionally, the body of an Elog rule may contain further atoms 
expressing conditions that the pattern instances should additionally satisfy. In 
particular, for each type of condition, there exists a built-in predicate (see below). 

The description of each item (occurring in the second column of each record) 
is determined by the extraction rule whose head is itemdes{S, X). The first 
atom in the rule body specifies that the context S of the extraction is a table 
and ensures that the variable S is instantiated with a table. The second atom in 
the rule body looks for subelements of the table that qualify as table columns 
with some specific properties, in particular requiring that they contain a link 
(/ire/). The rule has as many matches as there are items on the given page. 
If the Web page is updated and two new records are inserted into the table, 
then the same rule will produce two more matches. Each match gives rise to a 
corresponding instantiation of the variable X. 

Thus, the head predicates defined by an Elog program represent the extrac- 
tion patterns defined by the wrapper program. For instance, the program in 
Figure 2 defines patterns such as <record>. <itemdes>. Elog rule bodies con- 
tain the following important ingredients. For a more detailed discussion about 
Elog predicates see Section 4.4 of [5]. 

Incompletely specified tree paths. These refer to the position(s) of the de- 
sired element(s) in the HTML tree. More details on the used document model 
are specified in [5]. There are various ways to specify a tree path pointing to 
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Fig. 3. Pattern Structure of Example of Figure 2 



e.g. a table row in an eBay page. The fully specified tree path to this node 
is: body .table.tr (the elements satisfying these paths are referred to as matched 
pattern instances). Two incompletely specified tree paths to the same node are 
. * .body. * .tr and . * .body. * .table. * .tr, where the star signs are wildcards 
(the dots just act as concatenation sign). An incompletely specified tree path 
.-k.name is an abbreviation of the skip-to sequence {E — name)* name where E is 
the alphabet of element types. The first discovered elements of the type “name” 
are considered in all possible paths. Observe that, interpreting the star in this 
way, a tree path .-k .table identifies only the outermost tables in a document, and 
hence acts as some kind of minimization. 

Attribute Conditions. An incompletely specified tree path may be too general 
for describing an intended extraction target. In that case, additional atoms in 
the rule body may express further restricting conditions. Among these are so- 
called attribute conditions. Attribute conditions impose restrictions on matched 
elements. For example, leaf nodes of the HTML tree representing text strings may 
have a font-style attribute which takes the value italics if the represented text 
is in italics. Moreover, we treat the contents of an element as special attribute 
elementtext. Consider the rule for tableseq in Figure 2: One of its predicates 
uses an attribute condition expressing that the elementtext needs to contain the 
word “Current” (“contain” due to the substr keyword) This attribute condition 
restricts the tree path . * .table., which identifies tables by limiting the matches 
to those text fields that contain the word “Current” . Attribute Conditions may 
require exact matches or partial matches, or satisfaction of a particular regular 
expression possibly extended by the use of variables. 

Element Characterizations. A set of elements of a subtree of an HTML tree 
are identified with a tree path (starting from the subtree root), where addition- 
ally a set of attribute conditions is satisfied. Such a characterization is called 
an element path definition. Equivalently, XPath expressions can be used instead 
(with some extensions, such as the possibility to express that an attribute value 
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is a concept like “isCity”). To simplify presentation, however, we stick to our 
introduced notation. A set of substrings can be identified by using a string path 
definition, which can either be a regular expression, or refer to a concept, or 
even combine both. Consider the example of Figure 2, in which the rule defining 
<currency> refers to a variable whose instances are currencies. 

Tree Extraction Definition Predicates. These predicates specify that a vari- 
able should be instantiated with a node in the HTML tree which matches an 
element path definition. See, for example, the subelem atom of the fourth rule in 
Figure 2, where the variable X is instantiated to all those text fields that occur 
within <record> and contain a link. The variable S in this atom denotes the 
super entity or, as we call it, the parent pattern, from which the current target 
should be extracted via subelem. This parent pattern instance is constrained to 
be an instance of <record> by the first atom of the rule. Note that the tree 
path specified in a tree extraction definition predicate is always relative to the 
parent pattern, i.e., its starting point is a node corresponding to the parent pat- 
tern (in our example rule, an instance of <record>). Moreover, with subregion, 
a sequence of elements can be extracted (e.g. used in tableseq in Figure 2). 

String Extraction Definition Predicates. In the HTML parse tree, strings 
are represented by the text of leaves of type content. However, we associate a 
string Cn to every node n of the parse tree by simply concatenating (in left- 
to-right order) all strings corresponding to leaves of the subtree rooted in n. 
The string associated to node n is available in the Lixto system as the 

value of an additional attribute elementtext of any given node n. Several special 
conditions that express restrictions on such elementtexts can be expressed in 
Elog. Elog predicates expressing such special string conditions are referred to 
as string extraction definition predicates. As an example, consider the final two 
rules of the program of Figure 2. The last rule uses a regular expression as string 
path definition, the other one a variable reference to a concept atom (explained 
below). Moreover, Attribute Extraction Predicates such as subatt (see examples 
in Section 6) allow to extract the contents of attribute values. 

Contextual Conditions. Contextual conditions specify that some other ele- 
ments must or must not appear either before or after some instance. These con- 
textual elements are not limited to text elements. For example, on a page with 
several tables, the final table could be identified by an external condition stating 
that no table appears after the desired table. The rule defining a <tableseq> 
uses both an after and a before condition to express that one is interested in 
exactly the region between some specified elements. The definition of <date> 
uses a notafter condition to express that the column which contains the date is 
not followed by another column. 

Internal Conditions. Such conditions require that some characteristic feature 
must or must not appear within an instance. Imagine, one wants to extract all 
tables containing a word typeset in italics. This could be obtained by adding 
an internal condition called contains to the body of the rule that defines the 
pattern <record>. This condition expresses that in the subtree rooted at the 
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node representing the desired table row, a node must exist whose font-style 
attribute is defined and has the value italics. 

Concept Conditions. These predicates define concepts of some built-in 
top-level ontology. For example, one may enrich the system with predicates 
isEmail{X), isCountry(X), or isCurrency{X) (see Figure 2), stating that a 
string X represents an email address, a country, or a currency, respectively. 
These values of the variable X are created as output of concept attribute con- 
ditions or string path definitions (using \?;ar[Jf]). They are not required to be 
unary, e.g. isDate{X,Y) is a binary predicate with output Y in standard date 
format. 

Comparison Conditions. These are predefined relations for predefined onto- 
logical classes of elements. Using these conditions, one can e.g. compare two dates 
(binary predicate), or require that an email address exists (unary predicate). 

Pattern References. Each standard filter contains a reference to its parent 
pattern which defines the context of a rule. For example, see the rule defining 
<itemdes> in Figure 2. It refers to <record> as parent. The substitution for S 
is the actual tree region which acts as parent instance. Moreover, additional 
pattern references can be used, for instance to express that an instance of some 
pattern always occurs after an instance of another pattern. Such additional pat- 
tern references open the way for reference recursion (see Section 7 for details). 

Range Conditions. A range condition further restricts the set of pattern in- 
stances extracted by a filter by selecting only a subset of the pattern instances 
which satisfy the conditions in the body of the filter. Indeed the pattern in- 
stances extracted from a certain parent pattern instance are ordered according 
to their position in the document, and a range condition selects only those pat- 
tern instances that belong to the required range of solutions. To any rule a range 
condition such as “[3,7]” can be added, indicating that the solution only includes 
the third up to the seventh matched target. Counting can occur starting with 
the first or with the last instance. 

Using the above predicates, a standard extraction rule looks as follows: 

New(S, X) V- Par(_, S), Ex(S, X), Co(S, X, . . .)[a, b] 

where S is the parent instance variable, X is the pattern instance variable, 
Ex{S, X) is an extraction definition predicate, and the optional Co{S,X, . . .) are 
further imposed conditions. A tree (string) extraction rule uses a tree (string) 
extraction definition atom and possibly some tree (string) conditions and general 
conditions. The numbers a and b are optional and serve as range parameters. New 
and Par are pattern predicates referring to the parent pattern and defining the 
new pattern, respectively. This standard rule reflects the principle of aggregation. 

The semantics of a rule is given as the set of matched targets x: A substitu- 
tion s, X for S and X evaluates New{s,x) to true iff all atoms of the body are 
true for this substitution. Only those targets are extracted for which the head 
of the rule resolves to true. Moreover, if the extraction definition predicate is a 
subsequence predicate, only minimal instances are matched (i.e. instances that 
do not contain any other instances). This is a nonmonotonic concept discussed 
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in Section 7. Observe that range criteria are applied after non-minimal targets 
have been sorted out. Note that range conditions are well-defined only in the 
case of no reference recursion (cf. to Section 7). 

A pattern definition (for short, pattern) is a set of extraction rules defining the 
same head. We distinguish document, tree and string patterns. To tree patterns, 
only tree extraction rules can be asserted, and to string patterns only string 
extraction rules. The third kind of patterns, document patterns, are discussed 
in the next section. A pattern acts like a disjunction of rule bodies: To be an 
extracted instance of a pattern, a target needs to be in the solution set of at 
least one rule. The set of matched target instances of a pattern additionally obeys 
a minimality criterion (see Section 7). In patterns, even in those consisting of 
a single rule, overlapping targets may occur. Observe that we do not pose the 
requirement that each rule belonging to a given pattern refers to the same parent 
pattern. This, together with the capability of document navigation, allows for 
recursion over patterns as explained in more detail in Section 6. 

An extraction program P is a set of patterns. Elog program evaluation differs 
from Datalog evaluation in the following three aspects: The use of built-in pred- 
icates, various kinds of minimization, and the use of range conditions. Moreover, 
atoms are not evaluated over an extensional database of facts representing a 
Web page, but directly over the parse tree of the Web page. 

The application of a program to an HTML page creates a set of hierarchically 
ordered tree regions and string sources (called a pattern instance base) by ap- 
plying all patterns of the program to a given and possible further HTML pages 
(see the notion of document filters in Section 4). Each pattern produces a set 
of instances. Each pattern instance contains a reference to its parent instance. 
Observe that the pattern instance base always forms a forest, regardless of the 
structure of the pattern graph. We consider the instances of document filters as 
root node of each tree of this forest. The pattern instance base can be translated 
into XML as already described in Section 2. 

4 A Closer Look at some Lixto Features 

In this section, we discuss some more advanced features of Lixto, in particular 
two further kinds of rules. A standard rule reflects the principle of aggregation, 
however, designers of wrappers sometimes wish to express specialization. For 
instance, if one rule extracts a set of tables, it might be desirable to create a 
rule which restricts the extracted tables to those which contain some particular 
feature. A specialization rule looks as follows: 

New(S,X) 01d(S,X),Co(S,X, ...)[a,b] 

In such a rule a pattern is specialized, i.e. some of the parent-pattern in- 
stances are returned as pattern instances of the new pattern definition. It does 
not contain a parent-pattern reference and an extraction definition atom; in- 
stead it only contains a pattern reference. Observe that equally to specialization 
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rules, generalization rules can be used by simply creating multiple specialization 
rules for one pattern which refer to different patterns and do not contain any 
conditions. Another kind of rule is the document rule, using a getDocument(S,X) 
atom, where S' is a string source representing an URL, and X the Web page the 
URL points to. With such rules, one can crawl to further documents. 



New(S,X) ^ Par(_, S), getDocument(S, X) 

Each Elog program has an initial rule using the getDocument atom with 
user-specified input. The initial document rule is the only rule without a parent- 
pattern reference. Instead, it uses a variable “$1” (or a fixed URL) which is 
instantiated to a string source representing an URL during run time (the start 
document). Document filters can be applied to document patterns only. Parents 
of tree patterns are either tree or document patterns, parent of string patterns 
are tree or string patterns, and parents of document patterns are string patterns. 

Figure 4 illustrates the use of document rules together with specialization 
rules. This example moreover illustrates the use of disjunctive pattern defini- 
tions pointing to two different parents which actually evolved in this case from 
two different kind of documents. Consider the root pattern <document> and 
its child patterns <ebaydocument> and <yahoodocument>. Both are specializa- 
tions requiring that the document is an eBay page (a category search result on 
WWW. ebay . com such as http ; / /listings . ebay. com/aw/plistings/list/ all/ 
category3707/index.html), or a yahoo auctions page (i.e., a search result of 
auctions . yahoo . com), respectively. Observe that the patterns <ebaydocument> 
and <yahoodocument> are not document patterns, but tree patterns, since they 
refer to instances of tree regions. The predicate contains is an internal condition, 
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Fig. 4. Wrapper for eBay /Yahoo using Specialization and Disjunctive Patterns 
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expressing that there is an element in X which satisfies the given element path 
definition. 

document(S, X) ^ getDocument(S = $1,X) 
ebaydocument(S, X) ^ document (S, X), 

contains(X, (*.body, [(elementtext, eBay, substr)]), _) 

yahoodocument(S, X) ^ document (S, X), 

contains(X, (*.body, [(elementtext, Yahoo, substr)]), _) 

5 Disjunctive Pattern Constrnction 

There are several cases, where it is necessary to define more than one filter for 
the same pattern to express how to extract desired pieces of information from a 
Web page. In this section we show some real world examples where it is useful to 
define a pattern using a disjunction of filters. Moreover, we show that is generally 
possible that different filters of the same pattern can extract information from 
different parent patterns. Let us first consider an example where a wrapper 
designer wants to define a pattern consisting of filters that describe extraction 
targets for different page types. Assume a wrapper extracts prices from two kind 
of Web pages displaying books and their prices, where pages of the first kind 
are US pages and pages of the second kind are UK pages. The characteristic 
features of prices are a dollar sign on US Web pages and a pound sterling sign 
on UK pages. Assume, furthermore, the current sample page is a US page. A 
pattern named <price> should thus be defined via two filters: the first taking 
care of US pages and the second of UK pages. After having visually created an 
appropriate filter for prices in USD on an already loaded US sample page, the 
designer switches to a UK sample page and visually defines the second filter for 
the <price> pattern on that page. The wrapper then works on both types of 
pages. 

In Lixto it is not only possible to create a pattern consisting of several filters, 
but also that filters of a particular pattern definition refer to a different parent 
pattern. Again, consider the example in Figure 4. For both the <ebaydocument> 
and the <yahoodocument> pattern we now have to extract the list of available 
items (records) . Since records are structured differently in eBay and yahoo auc- 
tions, it is necessary to create for each kind of page a record pattern of its own, 
i.e. <ebayrecord> and <yahoorecord>. Once we have defined the patterns for 
the records, the patterns <itemdes>, <price>, <bids> and <date> can be easily 
defined with one filter for each kind of record. Although this wrapper works fine 
for both yahoo and eBay auctions, it still only returns results from one summary 
page as it does not follow the “next” link, and also is not capable of extracting 
detail information. Moreover, using the pattern <itemdes> as parent, a string 
pattern URL is defined using an attribute filter. This attribute filter extracts the 
value of the link to detailed information of the particular item. This attribute 
filter works for both sites, since both store the URL pointing to the detail page 
in the corresponding href attribute. 



URL(S, X) ^ itemdes(_, S), subatt(S, href , X) 
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An attribute filter uses the extraction definition predicate subatt to extract 
an attribute value of instances of S and instantiates a string source X with it. 
The following additional features are currently implemented and can be added 
via Lixto’s XML Tool: 

1. The pattern <ebayrecord> and <yahoorecord> can be both mapped to the 
XML element <record>, and an attribute source of <record> can be defined, 
which takes the constant value eBay or yahoo, respectively. 

2. In case the string source of <URL> is a relative URL, a prefix variable (BASE) 
can be added to it, which has the value of the base URL of the document 
from which the information is extracted. This variable can also be used for 
following relative links when crawling to further pages (see next section) . 

3. Auxiliary patterns such as <ebaydocument> and <yahoodocument> can be 
decided to not being mapped to XML, and a DTD can be created by addi- 
tionally assigning a multiplicity to each data type (Figure 4). 




Fig. 5. Ebay item description page 



6 Web Crawling and Recursive Wrapping 

6.1 Following Links 

For each item, eBay pages contain a reference to a page containing detailed 
information about the item itself. In the previous section, we have shown how to 
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extract the URL pointing to detail pages, but we did not further use it. In this 
section we extend the wrapper program to extract also the detailed description 
of each item. This is an instance of a general class of applications, where a 
wrapper needs to collect and group together elements from several pages. The 
wrapper designer thus needs to “teach” the system on the base of sample pages 
how to follow URLs and collect the elements from the different pages. On eBay, 
each item is described by a line stating summary information for each given 
auction item. Each such line contains a link to a Web page with more detailed 
information on the respective item, such as the seller name and the shipping 
information (Figure 5). 

The designer adds a child document pattern <detaildocument> to the string 
pattern <URL> which resulted from extracting the value of the href attribute 
of <itemdes>. For this, the designer proceeds by following one example detail 
document, loading the corresponding page, and defining the remaining relevant 
patterns (such as “sellername” and “shippinginfo” ) as child patterns of this new 
document pattern. Figure 6 illustrates an expanded Elog program of Figure 3, 
which defines an attribute filter extracting an URL (as in Figure 4), and a further 
document pattern consisting of one filter to extract detailed information for each 
item. The auxiliary patterns <URL> and <detaildocument> are not mapped to 
XML via the XML translation scheme. The navigation to a detail document 
looks as follows: 



URL(S, X) ^ itemdes(_, S), subatt(S, href , X) 
detaildocument(S, X) ^ URL(_, S), getDocument(S, X) 




Fig. 6. Following Links 
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6.2 Recursive Wrapping 

As we have already pointed out, each filter of a given pattern may refer to 
a different parent pattern. Here, we show how to apply this feature to reuse 
patterns. This paves the way for creating recursive programs. We call this kind 
of recursion pattern recursion. Another kind of recursion, reference recursion, 
based on pattern references is discussed in Section 7. 

Let us first consider the example program below. 

document(S, X) ^ getDocument($l, X) 

table(S, X) ^ document(_, S), subelem(S, . * .table, X) 
table(S, X) table(_, S), subelem(S, . * .table, X) 

It extracts all nested tables within one page, starting with the outermost, and 
stores them in this hierarchical order in the pattern instance base. The second 
rule of <table> is iteratively called, until no further table can be extracted. 

Another possible use of recursively defined wrappers is the following real- 
world application. Usually a wrapper designer does not want to extract data 
from a single eBay page on notebooks, but from all pages which are connected 
to each other via a “next page” link. We illustrate how the eBay program of 
Figure 6 can be extended to follow the next link and can reuse the already 
created pattern structure. Thus, the pattern <ebaydocument> is a document 
pattern consisting of two filters with different parents. The first one refers to 
the specified start document, whereas the second one follows the “next” link on 
each page. This part of the program looks as follows: 

next(S,X) ^ ebaydocument(_, S), 

subelem(S, (*. content, [(href, , substr), 
(elementtext, (next page), exact)]), X) 
nexturl(S, X) ^ next(_, S), subatt(S, href, X) 
ebaydocument(S, X) <— getDocument(S = $1,X) 
ebaydocument(S, X) ^ nexturl(_, S), getDocument(S, X) 

Recall that ”$1” is interpreted as a constant whose value is the URL of 
the start document of a Lixto session. This initial filter was already present in 
the previous example, and is the starting point of evaluation. The second filter 
refers to a different parent pattern, which is <nexturl>. Instances of the pattern 
<nexturl> are string sources which represent an URL. The pattern <nexturl> 
is created via an attribute filter which extracts via subatt the value of “href” 
present in the element which contains the text “next page” . 

In the second rule defining the pattern <ebaydocument>, the variable S is 
instantiated with string sources which represent URLs. For each “next” link, 
a new instance of <ebaydocument> is created, pointing to the next page. This 
new page serves as parent pattern for <tableseq> and <next>. The pattern 
structure is hence re-used for this new page. In this example, two different doc- 
ument patterns are used, on the one hand <ebaydocumeiit>, on the other hand 
<detaildocument>. Instances of the pattern <ebaydocument> are the summary 
pages, whereas instances of <detaildocument> are the detail information pages 
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for each item. In an XML translation scheme, the wrapper designer moreover 
wants to state how the documents are arranged inside the XML document. Al- 
though further instances of <ebaydocument> are hierarchically embedded in the 
previous one, the wrapper designer may maintain all <record> instances on the 
same level. 

In the visual interface of Lixto, a document pattern can be generated without 
the need to manually define auxiliary patterns. Instead visual guidance is offered 
for creating a single rule which uses a sequence of extraction definition predicates. 
For this example program, this single rule can be represented as follows: 

ebaydocument(S, X) ^ ebaydocument(_, S), subatt(Y, href, Z), getDocument(Z, X) 
subelem(S, (★.content, [(href, , substr), 

(elementtext, (next page), exact)]), Y), 




7 Nonmonotonic Issues 

Minimization of pattern instances. The set of matched targets of an Elog pattern 
are minimized in the way that pattern instances which contain other instances of 
the same pattern w.r.t. the same parent-pattern instance, are ignored. Pattern 
minimization applies both to tree and string rules. If a pattern consists of a 
single filter, the minimized set of its matched targets equals the initial set except 
if the extraction definition predicate of the filter is subregion (which extracts a 
sequence of elements) . 
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Consider the following simple example. Assume the major headlines of a 
particular newspaper Web page are a table consisting of various table data (the 
wrapper designer is interested in all the contents), and the minor headlines of the 
same newspaper which appear at the same page, are table columns of another 
table. The minor headlines are moreover characterized by a red font, and the 
major headlines contain a link (href) somewhere. However, the table containing 
the data of all minor headlines also contains links (i.e. the href attribute is 
a characteristic attribute occurring in these two tables only). A program for 
extracting all headlines can be written in the following way, where par is the 
parent pattern identifying the relevant area of the newspaper page. 

headline(S, X) <— par(_, S), subelem(S, . * .table, X) 

contains(S, (.content, [(href, , substr)]), X) 
headline(S, X) ^ par(_, S), subelem(S, . * .td, X) 

contains(S, (.content, [(font — color, red, exact)[), X) 

Hence, the first rule also matches the table which contains all minor headlines. 
However, since in this table, other pattern instances are matched, too, only the 
minimal instances are returned, which are in this case the table columns. For 
the major headlines, however, the table is extracted. Another example is the 
minimization of the set of instances generated by a single rule: 

tableseq(S, X) ^ par(_, S), subregion(S, . * .body. * .center, .table, .table, X) 



Such a rule (with additional conditions) is used in the eBay program of 
Figure 2. However, with no additional condition, the semantics is to extract 
all possible sequences of tables and to minimize the result; since the minimal 
sequences of tables are sequences of a single table, this rule returns such instances 
only. To enforce a particular longer sequence of tables, such as the sequence of 
tables containing the relevant data of sold items, some before and after conditions 
need to be added. In the case of eBay, immediately before and after the target 
instance a particular text or image shall occur, respectively. This returns a single 
pattern instance, the sequence of desired record tables. 

Pattern minimization can be expressed in Elog extended with stratified nega- 
tion and a suitable built-in predicate contained-in(X,Y) expressing offset-wise 
containment of X in Y. In particular, a set of filters of p{S, X) defining the pat- 
tern p is rewritten in the following way. Consider the initial pattern definition: 

p(S,X) ^ pan(_,S),Exi(S,X),Coi(S,X, ...) 

p(S,X) . 

p(S, X) ^ parn(_, s), Ex,(S, X), COn(S, X, . . .) 

The pattern name is renamed to p' and additional rules are added: 
p'(S,X) ^ pan(_,S),Exi(S,X),Coi(S,X, ...) 

p'(s,x) . 

p'(S, X) parn(_, S), Exn(S, X), Con(S, X, . . .) 
p"(S,X) ^ p'(S,X),p'(S,Xi),contained_in(Xi,X) 
p(S,X) ^p'(S,X),not p"(S,X) 
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The final rule requires that instances of X and X\ are both from the same 
parent pattern instance (otherwise, if they stem from different parent-pattern 
instances, minimization is usually undesired). In the rewriting, p' is the pattern 
predicate initially being built by different filters. Each instance p(s,x), which is 
non-minimal, i.e. for which there exists a smaller valid p”{s,x), is not derived. 
Only minimal instances are derived. 

Ranges. The semantics of range criteria [a, 6] of a filter rule NewPat{S, X) ^ 
filterbody[a, b] can also be expressed by a suitable rewriting of the rule. A range 
condition assumes that an order relation is defined among pattern instances ex- 
tracted by the same parent pattern instance, thus in the rewriting we assume 
the presence of a predicate greater{S, X, Y) which evaluates to true if X and Y 
are instances derived from S and X precedes Y (using character offsets for com- 
parison) . The first step of rewriting consists of adding a new predicate NewPat' 
that is defined by a unique filter NewPat' {S, X) <— filterbody. Then, two pred- 
icates FirstSol and succ are defined. FirstSol selects from the instances in 
NewPat' the first instance, and succ defines a successor relation among in- 
stances in NewPat' (due to the lack of space we omit the formal definition). 
The complete rewriting is as follows: 

NewPat(S, X) <— NewPat'(S, X), Solposition(S, X, P), a < P < b 
Solposition(S, X, 1) NewPat^(S, X), FirstSol(S, X) 

Solposition(S, X,P) Solposition(S, X', P'), NewPat'(S, X), succ(S, X', X), P = P' -|- 1. 

In both predicates FirstSol and succ, the predicate NewPat' appears 
negated, hence, the predicate NewPat depends on negation of all the predi- 
cates appearing in filterbody. 

Pattern Reference Recursion and Ranges. Using ranges together with pattern 
references might introduce unstratified negation. Using pattern references can in- 
troduce reference recursion. Still, without ranges, a unique model is returned. 
However, additionally allowing range conditions to occur in such recursive rules 
requires to use a semantics akin to the stable model semantics (returning multi- 
ple models) or well-founded semantics (returning a minimal model) as this intro- 
duces unstratified negation into the program (considering the above rewriting). 
For the following example (possibly containing additional filters for p and q), a 
nonmonotonic semantics is required. 

p(S, X) <— par(_, S), subelem(S, epd, X), before(S, X, . . . , Y), q(S, Y)[a, b] 
q(S, X) <— par(_, S), subelem(S, epd, X), before(S, X, . . . , Y), p(S, Y)[c, d] 

Observe that a program which uses range and pattern recursion, but no 
reference recursion, is always locally stratified, i.e. its ground instantiation is 
stratified. For implementation issues, we limit pattern references in the way 
that the program remains locally stratified. This is a subset of programs whose 
rewriting contains only stratified negation. 



Declarative Information Extraction, Web Crawling, and Recursive Wrapping 



41 



8 Current/Future Work 

Further work includes to consider various extensions of Elog such as using strat- 
ified negation instead of special negative predicates like notbefore, to extend 
handling of pattern references together with recursion as discussed above, to 
study further possibilities of conditions such as universially quantified ones (that 
require all elements to have a particular feature), and complement extraction 
(e.g. to remove advertisments from Web pages). An editor of Elog rules will be 
offered for more experienced wrapper designers who nevertheless lack program- 
ming facilities. This editor describes Elog patterns using a colloquial pattern 
description language. A concept editor for adding syntactic and semantic con- 
cepts to the list of built-in predicates is currently under construction. Moreover, 
the Lixto prototype is currently being re-designed as servlet version allowing 
pattern generation in the user’s favorite browser. Finally, an Elog2XSLT con- 
version tool is going to be developed which will transform a subset of possible 
Elog programs into XSLT. 
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Abstract. Every logical formalism gives rise to two fundamental algo- 
rithmic problems: model checking and inference. In propositional logic, 
the model checking problem is polynomial-time solvable, while the infer- 
ence problem is coNP-complete. In propositional circumscription, how- 
ever, these problems have higher computational complexity, namely the 
model checking problem is coNP-complete, while the inference problem 
is nj-complete. In this paper, we survey recent results on the computa- 
tional complexity of restricted cases of these problems in the context of 
Schaefer’s framework of generalized satisfiability problems. These results 
establish dichotomies in the complexity of the model checking problem 
and the inference problem for propositional circumscription. Specifically, 
in each restricted case the model checking problem for propositional cir- 
cumscription either is coNP-complete or is polynomial-time solvable. 
Furthermore, in each restricted case the inference problem for propo- 
sitional circumscription either is II 2 -complete or is in coNP. These di- 
chotomy theorems yield a complete classification of the “hard” and the 
“easier” cases of the model checking problem and the inference prob- 
lem for propositional circumscription. Moreover, they provide efficiently 
checkable criteria that tell apart the “hard” cases from the “easier” ones. 



1 Introduction 

Circumscription, introduced by McCarthy [McC80], is one of the most well de- 
veloped and extensively studied formalisms of nonmonotonic reasoning. In cir- 
cumscription, formulas of a logic are used to specify properties of objects, models 
of formulas are ordered according to a suitable partial order, and preference is 

* Part of this research was carried out while on sabbatical at the University of Cali- 
fornia, Santa Cruz. Research partially supported by the Research Commmittee of 
the University of Patras and by the Computer Technology Institute. 

** Research partially supported by NSF grants CCR-9610257 and CCR-9732041. 
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given to models that are minimal with respect to this partial order. The key intu- 
ition behind the focus on minimal models is that they are the ones that embody 
common sense, because they have as few “exceptions” as possible. Consequently, 
circumscription can be thought of as an application of Ockham’s razor principle 
(principle of parsimony) to the formalization of common-sense reasoning. 

Propositional circumscription is the basic case of circumscription in which 
satisfying truth assignments of propositional formulas are partially ordered ac- 
cording to the coordinatewise partial order < on Boolean vectors, which extends 
the order 0 < 1 on {0, 1}. Specifically, if a = (ai, . . . , a„) and /3 = (&i, . . . , 6„) 
are two truth assignments, then a < j3 holds if ai < bi for every i such that 
1 < i < n. A minimal model of a propositional formula is a truth assignment 
a such that the following two conditions hold: (1) a{(p) = 1; (2) if /3 is a truth as- 
signment such that /3{ip) = 1 and f3 < a, then a = (3. For example, the minimal 
models of the formula (a; V y) A (-■a; V y) A (a; V ~^y) are (0, 1) and (1, 0). 

Every logical formalism gives rise to two fundamental decision problems: 
model checking and inference. Intutitively, the former is the problem of deciding 
whether a “structure” satisfies a “formula”, whereas the latter is the problem 
of deciding whether a “formula” can be inferred from another “formula” in the 
context of the formalism under consideration. In the case of propositional cir- 
cumscription, these two problems take the following precise form. 

Definition 1 : The model checking problem for propositional circumscription 
asks: given a propositional formula ip and a truth assignment a, is a a minimal 
model of pi 

The inference problem for propositional circumscription asks: given two 
propositional formulas p and if , is ip true in every mininal model of pi 

We write p |=ciRC 'P to denote that p is true in every minimal model of p. 

□ 

It has been shown that the model checking problem for propositional cir- 
cumscription is coNP-complete (Cadoli [Cad92]), whereas the inference problem 
for propositional circumscription is II 2 -complete^ (Eiter and Gottlob [EG93]). 
In fact, the model checking problem for propositional circumscription remains 
coNP-complete even when restricted to 3GNF-formulas, while the inference prob- 
lem p ^ciRC P for propositional circumscription remains Ff^-complete even 
when p is a, 3GNF-formula and '0 is a negative literal ~^u. These results quantify 
the increase in computational complexity that arises when making the transition 
from ordinary propositional logic to propositional circumscription, since in the 
case of ordinary propositional logic the model checking problem is solvable in 
linear time and the inference problem is coNP-complete (Gook [Coo71]). More- 
over, these results raise the problem of identifying restricted cases in which the 
model checking problem and the inference problem for propositional circumscrip- 
tion have computational complexity lower than the general case. To this effect, 
Gadoli [Carl92,Gad93] found several polynomial-time solvable cases of the model 

^ The class II 2 forms the second level of the polynomial hierarchy PH, and contains 
both NP and coNP as subclasses (see [Pap94]). 
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checking problem for propositional circumscription; in a similar vein, Cadoli 
and Lenzerini [CL94] studied restricted cases in which the inference problem for 
propositional circumscription is polynomial-time solvable or is in coNP. 

In studying restricted cases of an algorithmic problem, ideally one would 
like to have a rich conceptual framework that makes it possible to express a 
variety of restricted cases and analyze their complexity. For Boolean satisfiabil- 
ity, such a framework was introduced and investigated by Schaefer [Scli78], who 
succeeded in obtaining a complete classification of the complexity of Boolean 
satisfiability problems in this framework. Cadoli [Cad92,Cad93] proposed that 
the model checking problem for propositional circumscription be investigated in 
Schaefer’s framework and raised the question of whether it is possible to obtain 
a complete classification of its complexity. This question was settled affirma- 
tively in [KKOla]; moreover, in [KKOlb] the complexity of the inference problem 
for propositional circumscription was investigated in the context of Schaefer’s 
framework and a characterization of the II 2 -complete cases was obtained. 

The balance of this extended abstract is organized as follows. In Section 2, 
we present Schaefer’s framework and state his main results on the complex- 
ity of Boolean satisfiability problems. In Section 3, we describe our results on 
the complexity of the model checking problem and the inference problem for 
propositional circumscription in the context of Schaefer’s framework. Finally, in 
Section 4 we discuss certain open problems and directions for future research. 



2 Schaefer’s Framework for Boolean Satisfiability 

A logical relation i? is a non-empty subset of {0, 1}^, for some k > \. Such a 
logical relation can be thought of as the set of all satisfying truth assignments of 
a generalized propositional connective R' . Schaefer [Scli78] investigated Boolean 
satisfiability problems in which the inputs are formulas in generalized conjunctive 
normal form, that is to say, they are conjunctions of atomic formulas derived 
from a fixed finite set of logical relations. 

Definition 2: Let S = {i?i, . . . , Rm} be a finite set of logical relations of various 
arities, let S' = {i?j, . . . , i?^} be a set of relation symbols whose arities match 
those of the relations in S, and let V be an infinite set of variables. 

- A CNF(S')-formula is a finite conjunction Ci A . . . A C„ of clauses built 

using relation symbols from S' , variables from V, and the constants 0 and 1, 
that is, each Ci is an atomic formula of the form where i?' is 

a relation symbol of arity k in S', and each is a variable in V or one of 
the constants 0 and 1. The semantics of CNF(S')-formulas are defined in a 
standard way by assuming that variables range over the set of bits {0,1}, each 
relation symbol i?} in S' is interpreted by the corresponding relation Rj in S, 
and the constant symbols 0 and 1 are interpreted by 0 and 1 respectively. 

- Sat(S') is the following decision problem: given a CNF(S')-formula cp, is it 

satisfiable? (i.e., is there a truth assignment to the variables of ip that makes 
every clause of p true?) □ 
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It is clear that, for each finite set S of logical relations, Sat (S') is a problem in 
NP. Moreover, the family of all Sat(S) problems contains several well-known 
variants of Boolean satisfiability, as evidenced by the following examples. 

Example 3: 3 -Sat, the prototypical NP-complete problem, coincides with the 
problem Sat(S), where S = {Rq, Ri^ R 2 , R 3 } and 

— i?o = {0,1}^ — {(0,0,0)} (expressing the clause (a; V y V z)); 

— Ri = jo, 1}^ — 1(1, 0, 0)1 (expressing the clause (--a; V y V z)); 

— i ?2 = {0, 1}^ — 1(1, 1, 0)1 (expressing the clause {^x V ^y V z)); 

— i ?3 = {0, 1}^ — 1(1, 1, 1)1 (expressing the clause (^a; V ^y V -'z)). 

Similarly, but on the side of tractability, 2 -Sat coincides with the problem 
Sat(S), where S = {To,Ti,T 2 } and 

— To = {0, 1}^ — 1(0, 0)1 (expressing the clause (x V y)); 

— Ti = {0, 1}^ — {(1, 0)} (expressing the clause {-'X V y)); 

— T 2 = {0, 1}^ — 1(1, 1)1 (expressing the clause (->a; V ^y)). 

Positive- 1-In-3-Sat is the following decision problem: given a 3CNF-formula 
such that each clause is of the form {x V y V z), does there exist a truth as- 
signment that makes true exactly one variable in each clause? This problem is 
known to be NP-complete ([GJ79, L04, page 259]). A moments’ reflection re- 
veals that Positive- 1-In-3-Sat coincides with the problem Sat(S'), where S is 
the singleton |i?i/ 3 } consisting of the relation 



i?i/3 = {(1,0,0), (0,1, 0),(0, 0,1)}. 



Furthermore, it is easy to see that several other variants of Boolean satisfiability, 
including 1-In-3-Sat, Not-All-Equal-3-Sat and Horn 3-Sat, can be cast 
as Sat(5) problems for particular sets S of logical relations. □ 

The above examples demonstrate that the family of all Sat(S') problems con- 
stitutes a flexible and rich framework for expressing restricted cases of Boolean 
satisfiability. Schaefer [Scli78] studied the computational complexity of Sat (S') 
problems and obtained the following remarkable classification theorem: for ev- 
ery finite set S of logical relations, either Sat(S) is NP-complete or Sat(S) is 
solvable in polynomial time; moreover, there is an algorithm to decide whether 
Sat(S) is NP-complete or solvable in polynomial time. To appreciate Schae- 
fer’s result, one should recall that Ladner [Lad75] showed that if P yf NP, 
then there are problems in NP that are neither NP-complete nor in P, i.e., 
there exist problems of intermediate computational complexity between NP- 
complete and polynomial-time solvable. Consequently, Schaefer’s result can be 
described as a dichotomy theorem asserting that no Sat(S') problem is of such 
intermediate computational complexity. In fact, Schaefer’s result was the first 
non-trivial dichotomy theorem for a family of NP-complete problems. Since 
that time, dichotomy theorems have been obtained for several other families 
of decision, counting, enumeration, and optimization problems (for instance. 
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see [FHW80,HN90,Cre95,CH96,CH97,KSW97]). Overall, however, dichotomy 
theorems for families of algorithmic problems are rare; moreover, in view of 
Ladner’s theorem [Lad75], their existence cannot be taken for granted. 

Before stating Schaefer’s dichotomy theorem in precise terms, we need to 
introduce several necessary concepts. 

Definition 4: Let be a propositional formula. 

~ is bijunctive if it is a 2CNF-formula, i.e., it is a conjunction of clauses each 
of which is a disjunction of at most two literals. 

— (/? is Horn if it is the conjunction of clauses each of which is a disjunction of 
literals such that at most one of them is a variable. 

— ip is dual Horn if it is the conjunction of clauses each of which is disjunction 
of literals such that at most one of them is a negated variable. 

— (/? is affine if it is the conjunction of subformulas each of which is an exclusive 
disjunction of literals or a negation of an exclusive disjunctions of literals 
(by definition, an exclusive disjunction of literals is satisfied exactly when 
an odd number of these literals are true; we will use 0 as the symbol of the 
exclusive disjunction). 

Note that a formula ip is affine precisely when the set of its satisfying assignments 
is the set of solutions of a system of linear equations over the field {0,1}. □ 

Definition 5: Let i? be a logical relation and S a finite set of logical relations. 

— Ris bijunctive (Horn, dual Horn, or affine, respectively) if there is a proposi- 
tional formula ip which is bijunctive (Horn, dual Horn, or affine, respectively) 
and such that R coincides with the set of truth assignments satisfying ip. 

— S is Schaefer if at least one of the following four conditions hold: 

• every member of S is bijunctive; 

• every member of S is Horn; 

• every member of S is dual Horn; 

• every member of S is affine. 

— Otherwise, we say that S is non-Schaefer. □ 

There are simple criteria to determine whether a logical relation is bijunctive, 
Horn, dual Horn, or affine. In fact, a set of such criteria was already provided by 
Schaefer [Sch78]; moreover, Dechter and Pearl [DP92] gave even simpler criteria 
for a relation to be Horn or dual Horn. Each of these criteria involves a closure 
property of the logical relations at hand under a certain function. Specifically, a 
relation R is bijunctive if and only if for all ti, t 2 , 0 G R, we have that {ti V f 2 ) A 
(t 2 Vf 3 )A(fi Vta) G R, where the operators V and A are applied coordinate-wise to 
the bit-tuples. Note that the i-th coordinate of the tuple (tiVf 2 )A(t 2 Vt 3 )A(tiVf 3 ) 
is equal to 1 exactly when the majority of the i-th coordinates of t\,t 2 , t^ is equal 
to 1. Thus, this criterion states that R is bijunctive exactly when it is closed 
under coordinate-wise applications of the ternary majority function. R is Horn 
(respectively, dual Horn) if and only if for all t\,t 2 G R, we have that A t 2 G i? 
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(respectively, V ^2 & R)- Finally, R is affine if and only if for all fi, 12,^3 G R, 

we have that © t 2 © ^3 € R- As an example, it is easy to apply these criteria 

to the ternary relation R 1/3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and verify that R 1/3 
is neither bijunctive, nor Horn, nor dual Horn, nor affine. 

There are well-known polynomial-time algorithms for the satisfiability prob- 
lem for the class of all bijunctive formulas (2-Sat), the class of all Horn formulas, 
and the class of all dual Horn formulas. Moreover, if S is an affine set of logical 
relations, then Sat(S') is solvable in polynomial time using Gaussian elimina- 
tion. Schaefer’s seminal discovery was that these four cases are the only ones 
that give rise to tractable cases of Sat(S'). 

Theorem 6: [Schaefer’s Dichotomy Theorem, [Scli78]] Let S be a finite set of 
logical relations. If S is Schaefer, then Sat(S') is solvable in polynomial time; 
otherwise, it is is NR -complete. 

As an application. Theorem 6 immediately implies that Positive- 1-In-3-Sat 
is NP-complete, since it coincides with SAT(i?i/ 3 ), and i?i /3 is not Schaefer. 

To obtain the above dichotomy theorems, Schaefer had to first establish a 
result concerning the expressive power of CNF (S') formulas. Informally, this 
result asserts that if S is a non-Schaefer set of logical relations, then CNF(S)- 
formulas have extremely highy expressive power, in the sense that every logical 
relation can be defined from a CNF(S)-formula using existential quantification. 



Theorem 7: [Schaefer’s Expressibility Theorem, [Scli78]] Let S be a finite set 
of logical relations. If S is non-Schaefer, then for every k-ary logical relation R 
there is a CNF (S)-/or7mt/a ip{xi, . . . , Xk, z\, . . . , Zm) such that R coincides with 
the set of all truth assignments to the variables X\, . . . ,Xk that satisfy the formula 

{3zi) ■ ■ ■ {3Zm)L’{xi, ...,Xk,Zi,..., Zm)- 

3 Model Checking and Inference in Circumscription 

Schaefer’s framework makes it possible to introduce and study restricted cases 
of the model checking problem and the inference problem for propositional cir- 
cumscription. 

Definition 8: Let S' be a finite set of logical relations. 

- MC-Circ(S) is the following decision problem: given a CNF(S)-formula (p 
and a truth assignment a, is a a minimal model of tpl 

- Inf-Circ(S) is the following decision problem: given a CNF(S)-formula (p 

and a CNF-formula ip, is ip true in every minimal model of p7 □ 

Using the definitions, it is easy to see that, for every finite set S of logi- 
cal relations, MC-CiRC(S) is in coNP and Inf-Circ(S) is in H^ . There are 
natural sets S of logical relations such that MC-CiRC(S) is coNP-complete 
and Inf-Circ(S) is H^-complete. In particular, this holds true for the set 



48 



Lefteris M. Kirousis and Phokion G. Kolaitis 



S = {Rq, Ri, R 2 , Rs} of logical relations in Example 3 that give rise to 3-Sat 
(see [Cad92,Cad93,EG93]). In contrast, as pointed out in [Cad92,Cad93,CL94], 
if S' is a bijimctive or a dual Horn set of logical relations, then MC-CiRC(S) 
in P and Inf-Circ(S) is in coNP. Moreover, if S is a Horn set of logical re- 
lations, then both MC-CiRC(S) and Inf-Circ(S) are in P; this is so because 
every satisfiable Horn formula has a minimum (unique minimal) satisfying truth 
assignment that can be found in polynomial time. 

In view of Schaefer’s dichotomy theorem for Boolean satisfiability problems, 
it is natural to ask whether similar dichotomy theorems can be obtained for the 
family MC-CiRC(S) of model checking problems for propositional circumscrip- 
tion and the family Inf-Circ(5') of inference problems for propositional circum- 
scription, where S' is a finite set of logical relations. At first, one may expect 
that, if such dichotomy theorems hold, then the boundary of the dichotomy will 
be the same as that in Schaefer’s dichotomy theorem. In particular, one may 
expect that if S is a non-Schaefer set of logical relations, then MC-CiRC(S) 
should be coNP-complete and Inf-Circ(S) should be H^-complete. Neverthe- 
less, this turns out to be a rather naive expectation. Indeed, consider the set 
S = {R 1 / 3 } consisting of the logical relation R 1/3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. 
As seen earlier, S is non-Schaefer and so Sat(S') is NP-complete (recall that in 
this case Sat(S') is Positive-1-In-3-Sat). It is easy to see, however, that if ip 
is a CNF(S')-formula, then every satisfying truth assignment of is a minimal 
model of ip. Consequently, MC-CiRC(5') is in P (in fact, it is solvable in linear 
time) and Inf-Circ(S') is in coNP (in fact, it is coNP-complete). 

In [KKOIa], the following dichotomy theorem was established for the family 
MC-Circ(S') of model checking problems for propositional circumscription: if S 
is a finite set of logical relations, then either MC-CiRC(S') is coNP-complete 
or MC-Circ(5') is in P. Furthermore, in [KKOlb], the following dichotomy 
theorem was established for the family Inf-Circ(5') of inference problems for 
propositional circumscription: if S' is a finite set of logical relations, then either 
Inf-Circ(S) is n 2 -complete or Inf-Circ(S) is in coNP. It was also shown that 
the boundaries in these two dichotomies coincide, but differ from the boundary 
in Schaefer’s dichotomy theorem for Boolean satisfiability problems. These new 
dichotomy theorems were proved by first establishing corresponding dichotomy 
theorems in a key special case and then using the results for this special case as 
a stepping stone towards the full dichotomy theorems. 

Definition 9: A A:-ary logical relation R is 1-valid if it contains the all-ones k- 
tuple (!,...,!). A set S of logical relations is 1-valid if every relation in S 
is 1-valid. □ 

For example, the logical relation K = {(1, 1, 1), (0, 1,0), (0,0, 1)} is 1-valid. 
Note that the set S = {Rq, Ri, R 2 , R 3 } in Example 3 is not 1-valid, since the 
relation R 3 is not 1-valid. In contrast, the set P = {Rq, Ri, R 2 } is 1-valid. 

We now have all the prerequisites to state our dichotomy theorems for the 
model checking problem MC-CiRC(S') and the inference problem Inf-Circ(S'), 
when S' is a 1-valid set of logical relations. In this case, the boundary of the 
dichotomies coincides with the boundary in Schaefer’s dichotomy theorem. 
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Theorem 10: [KK01a,KK01b] Let S be a 1-valid set of logieal relations. 

~ If S is Schaefer, then MC-CiRC(S') is in P; otherwise, it is coNF -complete. 

- If S is Schaefer, then Inf-Circ(5') is in coNP; otherwise, it is H 2 -complete. 
Actually, if S is non-Schaefer, then even the following case of Inf-Circ(S') 
is II 2 -complete: given a CNF (S) -formula (p and a negative literal ~^u, does 
^ hciRC 

Moreover, there is a polynomial-time algorithm to decide, given a finite 1- 
valid set of logical relations, whether MC-CiRC(S') is in P or coNF -complete, 
and also whether iNF-CiRC(S') is in coNP or -complete. 

The following examples illustrate the preceding Theorem 10 and provide new 
instances of restricted cases of the model checking problems and the inference 
problem for propositional circumscription having the same inherent complexity 
as the general case. 

Example 11 : Consider again the logical relation K = {(1, 1, 1), (0, 1, 0), 
(0, 0, 1)}. Using the closure properties that characterize when a logical relation 
is bijunctive, Horn, dual Horn, or affine, it is easy to see that K is none of the 
above. For instance, K is not Horn because (0,1,0) A (0,0,1) = (0,0,0) ^ K; 
similarly, K is not affine because (1,1,1) © (0,1,0) © (0,0,1) = (1,0,0) ^ K. 
Consequently, Theorem 10 implies that MC-CiRC({iC}) is coNP-complete and 
lNF-CiRC({if}) is nf -complete. □ 

Example 12: Consider the 1-valid set P = {i?0i R 2 }, where, as seen earlier, 

Rq = {0,1}^ — {(0,0,0)} (expressing the clause (a: V y V z)), R\ = {0,1}^ — 
{(1,0,0)} (expressing the clause {^x V y V z)), and R 2 = {0,1}^ — {(1,1,0)} 
(expressing the clause (^a;V^yV2)). Thus, the class of CNF(P)-formulas consists 
of all 3CNF-formulas that do not contain a clause of the form (-ix V ^y V ~'z). 
Using the closure properties, it is easy to verify that i?i is not bijunctive, Horn, 
or affine, and that R 2 is not dual Horn. Consequently, Theorem 10 implies that 
MC-Circ(P) is coNP-complete and Inf-Circ(P) is H^-complete. □ 

Theorem 10 can be used as a stepping stone to obtain dichotomy theorems 
for the family of all MC-CiRC(S') problems and the family of all Inf-Circ(S') 
problems, where S is an arbitrary set of logical relations. To this effect, we use 
the following crucial concept, which was first introduced in [KKOla]. 

Definition 13: Let i? be a k-aiy logical relation. We say that a logical relation T 
is a 0-section of R if either T is the relation R itself or T can be defined 
from the formula R' {x\, . . . ,Xk) by replacing at least one, but not all, of the 
variables xi, . . . , Xfe by 0. □ 

To illustrate this concept, observe that the 1-valid logical relation {(!)} 
is a 0-section of = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, since it is definable by 

i?i/3(xi, 0, 0). Note that the logical relation {(1, 0), (0, 1)} is also a 0-section 
of i?i/3, since it is definable by the formula i?}^g(0, X2, X3), but it is not 1-valid. 
In fact, it is easy to verify that {(1)} is the only logical relation that is both 1- 
valid and a 0-section of R 1 / 3 . 
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Theorem 14: [KK01a,KK01b] Let S be a set of logical relations and let S* be 
the set of all logical relations T such that T is both 1-valid and a 0-section of 
some relation in S . 

- If S* is Schaefer, then MC-CiRC(S') is in P; otherwise, it is coNP -complete. 

- If S* is Schaefer, then Inf-Circ(S') is in coNP; otherwise, it is H 2 -complete. 
Actually, if S* is non-Schaefer, then even the following case of Inf-Circ(iS') 
is II 2 -complete: given a CIS¥{S) -formula ip and a negative literal ^u, does 

hciRC 

Moreover, there is a polynomial-time algorithm to decide, given a finite 1- 
valid set of logical relations, whether MC-CiRC(S') is in P or coNP -complete, 
and also whether Inf-Circ(S') is in coNP or II 2 -complete. 

We now present several different examples that illustrate the power of The- 
orem 14. The first example shows how the main result in [EG93] can be easily 
derived from Theorem 14. 

Example 15: Let S = {Rq, Ri, R 2 , R 3 } be the set of logical relations that give 
rise to 3-Sat. Since i?o, i?i, R 2 are 1-valid logical relations, they are members 
of S*. It follows that S* is not Schaefer, since, as seen earlier, Ri is not bijunctive, 
Horn or affine, and R 2 is not dual Horn. Theorem 14 immediately implies that 
Inf-Circ(S') is H^-complete. □ 

Example 16: Consider the set S = {Rq,R 3 }, where Rq and R 3 are as in the 
preceding Example 15. In this case, Sat(S') is the problem Monotone 3-Sat, 
that is to say, the restriction of 3-Sat to 3CNF-formulas in which every clause 
is either the disjunction of positive literals or the disjunction of negative literals. 
It is well known that this problem is NP-complete (this can also be derived from 
Schaefer’s Dichotomy Theorem 6). It is not hard to verify that every relation 
in S* is dual Horn (for instance, S* contains Rq, which is dual Horn). Conse- 
quently, Theorem 14 implies that MC-CiRC(S') is in P and Inf-Circ(S') is in 
coNP. □ 

The preceding example shows that the boundary in Schaefer’s dichotomy the- 
orem for Boolean satisfiability is different from the boundary in the dichotomy 
theorem for the model checking problem and the inference problem in proposi- 
tional circumscription. Our final example provides several other instances of this 
phenomenon. 

Example 17: If m and n are two positive integers with m < n, then R^/n is 
the n-ary logical relation consisting of all n-tuples that have m ones and n — m 
zeros. It is easy to see that Rm/n is not Schaefer. Consequently, if S' is a set of 
logical relations each of which is of the form Rm/n for some m and n with m < n, 
then Sat(S) is NP-complete. In contrast, S* is easily seen to be Horn (and, 
hence, Schaefer), since every relation T in S* is a singleton T = {(!,. ..,!)} 
consisting of the m-ary all-ones tuple for some m. Consequently, Theorem 14 
implies that MC-CiRC(S) is in P and Inf-Circ(S) is in coNP. 
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This family of examples contains Positive-1-In-3-Sat as the special case 
where S = {i?i/ 3 }. □ 

Remark 18 : The proofs of Theorem 10 and Theorem 14 can be found in 
[KK01a,KK01b]. They make use of the aforementioned Schaefer’s expressibility 
theorem (Theorem 7), additional specialized expressibility results, and a series 
of delicate reductions between problems. 

It should be pointed out that the problem actually studied in [KKOla] is 
the minimal satisfiability problem Min Sat(5), which is the complement of 
MC-Circ(S'): given a CNF(S')-formula S and a satisfying truth assignment a, 
is there a satisfying truth assignment f3 of tp such that /3 < a? Consequently, 
the dichotomy obtained in [KKOla] is a dichotomy between NP-completeness 
vs. membership in P, and clearly implies the dichotomy for MC-CiRC(S'). 

Note that here the logical constants 0 and 1 were allowed in the construc- 
tion of CNF(S')-formulas. Schaefer [Sch78] also obtained a dichotomy theorem 
for the satisfiability problem Sat(S'), when restricted to CNF(5')-formulas with- 
out constants; this result requires the deployment of additional technical machin- 
ery. Dichotomy theorems for MC-CiRC(S') and Inf-Circ(S'), when restricted to 
CNF(5') formulas without constants were also obtained in [KK01a,KK01bj. □ 

Remark 19: The dichotomy theorem for the family Inf-Circ(S') can be in- 
terpreted as asserting that, for every finite set S of logical relations, either 
Inf-Circ(S') is as hard as the full inference problem for propositional circum- 
scription (II 2 -complete) or Inf-Circ(S') is no harder than the inference problem 
for ordinary propositional logic (since the latter is coNP-complete) . 

It should be noted that researchers in computational complexity have isolated 
and studied several interesting complexity classes between coNP and II 2 , each 
with its own distinctive complete problems, such as the class DP of problems 
that are conjunctions of NP and coNP predicates. In fact, an entire hierarchy 
of complexity classes, known as the Boolean Hierarchy BH, is sandwiched be- 
tween coNP and H^ (see [.loh90]). Thus, our dichotomy theorem for the family 
Inf-Circ(S') reveals a dramatic gap in the complexity of the inference prob- 
lem for propositional circumscription between sets S of logical relations that are 
Schaefer and those that are non-Schaefer. □ 

4 Open Problems 

The dichotomy theorem for the family Inf-Circ(S'), where S' is a finite set of 
logical relations, characterizes the “truly hard” (H^ -complete) cases of the in- 
ference problem for propositional circumscription in Schaefer’s framework, but 
leaves open the possibility that further distinctions can be made between the 
“easier” cases of this problem. To this effect, we conjecture that a trichotomy 
theorem holds for the family Inf-Circ(S). Specifically, we conjecture that, for 
every finite set S of logical relations, exactly one of the following three alterna- 
tives holds: 
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1. Inf-Circ(S') is nl-complete; 

2. Inf-Circ(S') is coNP-complete; 

3. Inf-Circus') is in P. 

In view of the results described here, it remains to show that a dichotomy theo- 
rem holds for Inf-Circ(S) , when S is a Schaefer set of logical relations. Although 
partial results in this direction have been obtained in [CL94] , much more remains 
to be done. In particular, the exact complexity of Inf-Circ(S) is not known, 
when S is an affine set of logical relations. 

All dichotomy theorems described here are rather special to Boolean logic. 
Schaefer [Sch78] raised the problem of establishing dichotomy theorems for sat- 
isfiability problems over domains with at least three elements, i.e., dichotomy 
theorems for many- valued propositional logic. This problem remains open to 
date with no solution in sight, even for the case of 3- valued logic. 
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Abstract. Data integration is the problem of combining the data re- 
siding at different sources, and providing a unified view of these data, 
called global schema, which can be queried by the user. The interest in 
this kind of systems has been continuously growing in the last years. 
However, the design of a data integration system is a very complex task, 
and several issues remains open, including how to express the relation 
between the global schema and the sources, and how to process queries 
expressed on the global schema. In this paper we deal with these two 
problems, by presenting a logical framework for data integration, and 
by discussing the various choices for both the specification of a data 
integration system, and the design of query answering methods. Also, 
we elaborate on the observation that, in real world scenarios, the case 
of mutually inconsistent local databases will be very common, and we 
present the basic ideas in order to extend the integration framework with 
suitable nonmonotonic reasoning features for dealing with this case. 



1 Introduction 

Data integration is the problem of combining the data residing at different 
sources, and providing the user with a unified view of these data, called global 
schema, or global schema. The global schema is therefore a reconciled view of 
the information, which can be queried by the user. It is the task of the data 
integration system to free the user from the knowledge on where data are, and 
how data are structured at the sources. 

The interest in this kind of systems has been continuously growing in the 
last years. However, the design of a data integration system is a very complex 
task, and several issues remains open. Two main problems complicate the task: 

1. How to express the relation between the global schema and the sources, 

2. How to process queries expressed on the global schema. 

With regard to Problem (1), two basic approaches have been used to specify 
the relation between the sources and the global schema. The first approach, called 
global- as-view (or query-based), requires that the global schema is expressed in 
terms of the data sources. More precisely, to every concept of the global schema, 
a view over the data sources is associated, so that its meaning is specified in 
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terms of the data residing at the sources. The second approach, called local-as- 
view (or source-based), requires the global schema to be specified independently 
from the sources. The relationships between the global schema and the sources 
are established by defining every source as a view over the global schema. Thus, 
in the local-as-view approach, we specify the meaning of the sources in terms of 
the concepts in the global schema. It is clear that the latter approach favors the 
extensibility of the integration system, and provides a more appropriate setting 
for its maintenance. For example, adding a new source to the system requires 
only to provide the definition of the source, and does not necessarily involve 
changes in the global schema. On the contrary, in the global-as-view approach, 
adding a new source typically requires changing the definition of the concepts in 
the global schema. A comparison between the two approaches is reported in [20] . 

Problem (2) is concerned with the choice of the method for computing the 
answer to queries posed in terms of the global schema. While query answering 
in the global-as-view approach typically reduces to unfolding, an integration 
system based on the local-as-view approach must resort to more sophisticated 
query processing techniques. The main issue is that the system should be able 
to reason on the mapping so as to re-express the query in terms of a suitable 
set of queries posed to the sources. In this reformulation process, the crucial 
step is deciding how to decompose the query on the global schema into a set of 
subqueries on the sources, based on the meaning of the sources in terms of the 
concepts in the global schema. The computed subqueries are then shipped to 
the sources, and the results are assembled into the final answer. 

Independently on the method used for the specification of the mapping be- 
tween the global schema and the source schemas, it is easy to see that query 
processing in data integration is related to query answering using views. In turn, 
query answering using views can be seen as a form of reasoning with incomplete 
information. The interested reader is referred to [21] for a survey on this subject. 
Query answering using views has been investigated in the last years in the con- 
text of simplified frameworks. In [16,18], the problem has been studied for the 
case of conjunctive queries (with or without arithmetic comparisons), in [2] for 
disjunctive views, in [19,10,13] for queries with aggregates, in [11] for recursive 
queries and nonrecursive views, and in [6,7] for several variants of regular path 
queries. Comprehensive frameworks for view-based query answering, as well as 
several interesting results for various query languages, are presented in [12,1]. 

Query answering using views is also tightly related to query rewriting 
[16,11,20]. In general, a rewriting of a query with respect to a set of views is 
a function that, given the extensions of the views, returns a set of tuples that is 
contained in the answer set of the query with respect to the views. Usually, one 
fixes a priori the language in which to express rewritings (e.g., unions of con- 
junctive queries), and then looks for the best possible rewriting expressible in 
such a language. On the other hand, we may call perfect a rewriting that returns 
exactly the answer set of the query with respect to the views, independently 
of the language in which it is expressed. Hence, if an algorithm for answering 
queries using views exists, it can be viewed as a perfect rewriting [8,9]. 
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In this paper, we present a logical framework for data integration, and we dis- 
cuss the various choices for both the specification of a data integration systems, 
and the design of query answering methods. Also, we elaborate on the obser- 
vation that, in real world scenarios, the case of mutually inconsistent source 
databases will be very common, and we present the basic ideas in order to ex- 
tend the integration framework with suitable nonmonotonic reasoning features 
for dealing with this case. 

The paper is organized as follows. In the next section we set up a formal 
framework for data integration, based on first order logic. In Section 3, we discuss 
three basic means for specifying the mapping between the global schema and 
the source schemas. In Section 4 we extend the framework in order to cope with 
the problem of integrating incoherent source databases. Section 5 concludes the 
paper. 

2 Framework 

In this section we set up a formal framework for data integration systems (DISs) . 
In what follows, one of the main aspects is the definition of the semantics of both 
the DIS, and of queries posed to the global schema. For keeping things simple, 
we will use in the following a unique semantic domain A, composed of a fixed, 
infinite set of symbols. 

Formally, a DIS I? is a triple {Q,S,A4g^s), where Q is the global schema, S 
is the set of source schemas, and Mg^s is the mapping between Q and the source 
schemas in S. 

We denote with Ag the alphabet of terms of the global schema, and we assume 
that the global schema C/ of a DIS is expressed as a theory (named simply Q) in 
a logic Cg- 

We assume to have a set S of n source schemas 5i, . . . , We denote with 
Asi the alphabet of terms of the source schema Si. We also denote with As the 
union of all the A^ds. We assume that the various As^s are mutually disjoint, 
and each one is disjoint from the alphabet Ag. We assume that each source 
schema is expressed as a theory (named simply Si) in a logic Csi , and we use S 
to denote the collection of theories Si, ... ,Sn. 

The mapping Mg^s is the heart of the DIS, in that it specifies how the 
concepts^ in the global schema and in the source schemas map to each other. 
We discuss this aspect more deeply in the next section. Here, we simply assume 
that Aig^s is an appropriate specification of how the concepts in the various 
schemas map to each other. 

Intuitively, in specifying the semantics of a DIS, we have to start with a model 
of the source schemas, and the crucial point is to specify which are the models of 
the global schema. Thus, for assigning semantics to a DIS I = {G,S,Mg^s), we 
start by considering a source model B for T>, i.e., an interpretation that is a model 

^ Here and below we use the term “concept” for denoting a concept of the schema, 
which in turn can be represented either by a class or by a relation (not necessarily 
atomic) in logic. 
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for all the theories of S. We call global interpretation for T> any interpretation 
for Q. A global interpretation X for T> is said to be a global model for T> wrt B if: 

— X is a, model of G, 

— X satisfies the mapping Mq^s wrt B. 

In the next section, we will come back to the notion of satisfying a mapping wrt 
a source model. The semantics of V, denoted sem{T>), is defined as follows: 

semifD) = {X \ there exists a source model B for V 
s.t. X is a global model for X) wrt B } 

Queries posed to a DIS V are expressed in terms of a query language Q,q 
over the alphabet Aq and are intended to extract a set of tuples of elements of 
A. Thus, every query has an associated arity, and the semantics of a query q of 
arity n is defined as follows. The answer of g to D is the set of tuples 

g® = {(ci, . . . , c„) I for all X € sem{V), (ci, . . . , c„) S g^ } 
where g^ denotes the result of evaluating g in the interpretation X. 

3 Specifying the Mapping 

As we said before, the mapping At 5,5 represents the heart of a DIS V = 
{G,S,M.g^s)i and allow for mapping a concept in one schema into a view, i.e., a 
query, over the other schemas. In this section we discuss the various ways that 
one can use for specifying the mapping. The terminology used in this section is 
inspired by [15,14]. In our analysis, we will concentrate on mapping with “sound” 
views. More general kinds of mappings are discussed in [4]. 

3.1 Global- Centric Approach 

In the global-centric approach (aka global-as-view approach), we assume we have 
a query language V 5 over the alphabet A 5 , and the mapping between the global 
and the source schemas is given by associating to each term in the global schema 
a view, i.e., a query, over the sources. The intended meaning of associating to a 
term C in ^ a query Vg over S, is that such a query represents the best way to 
characterize the instances of C using the concepts in S. Let S be a source model 
for T>, and X a global interpretation for T>. Then X satisfies the pair {C, Vg) in 
Aig^S wrt B, if all the tuples satisfying Vg in V satisfy C in X. We say that X 
satisfies the mapping Mg^s wrt B, if X satisfies every pair in Mg^s wrt B. 

The global-centric approach is the one adopted in most data integration 
systems. It is a common opinion that this mechanism allow for a simple query 
processing strategy, which basically reduces to unfolding the query using the 
definition specified in the mapping, so as to translate the query in terms of 
accesses to the sources [20]. Recently, we have showed that in the case where 
we add constraints (even of a very simple form) to the global schema, query 
processing becomes harder, due to the need of dealing with a form of incomplete 
information. 
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3.2 Source- Centric Approach 

In the source-centric approach (aka local-as-view approach), we assume we have 
a query language Vg over the alphabet Ag , and the mapping between the global 
and the source schemas is given by associating to each term in the source schemas 
a view, i.e. a query, over the global schema. Again, the intended meaning of 
associating to a term C in 5 a query Vg over Q, is that such query represents 
the best way to characterize the instances of C using the concepts in Q. Let B 
be a source model for T>, and X a global interpretation for T>. Then X satisfies 
the pair (Vg, C) in Mg^s wrt B, if all the tuples satisfying C u\V satisfy Vg in 
X. As in the global-centric approach, we say that X satisfies the mapping Aig^s 
wrt B, if X satisfies every pair in Aig^s wrt B. 

Recent work on data integration follows the source-centric approach [17,5,3]. 
The major challenge of this approach is that in order to answer a query expressed 
over the global schema, one must be able to reformulate the query in terms of 
queries to the sources. While in the global-centric approach such a reformulation 
is guided by the definitions in the mapping, here the problem requires a reasoning 
step, so as to infer how to use the sources for answering the query [8,3]. Many 
authors point out that, despite its difficulty, the source-centric approach better 
supports a dynamic environment, where source schemas can be added to the 
systems without the need of restructuring the global schema. 

3.3 Unrestricted Mapping 

In the unrestricted approach, we have both a query language V 5 over the al- 
phabet A 5 , and a query language Vg over the alphabet Ag, and the mapping 
between the global and the source schemas is given by relating views over the 
global schema to views over the source schemas. Again, the intended meaning of 
relating the view Vg over the global schema to the view Vs over the source schema 
is that Vs represents the best way to characterize the objects satisfying Vg in 
terms of the concepts in S. In other words, in the unrestricted approach we try 
to combine and extend the representation power of the previous approaches. Let 
B he a source model for T>, and X a global interpretation for T>. Then X satisfies 
the pair (Vg,Vs) in Aig^s wrt B, if all the tuples satisfying satisfying I 4 in V 
satisfy Vg in X. Again, we say that X satisfies the mapping Aig^s wrt B, if X 
satisfies every pair in Aig^s wrt B. 

This approach is largely unexplored, mainly because it combines the difficul- 
ties of the other ones. However, we argue that, in real world settings, this is the 
only approach that provides the appropriate expressive power. 

4 Beyond First-Order Logic 

According to our definition of a DIS T>, it is easy to see that it may happen that 
no global model for T> exists, even when at least one source model for T> exists. 
This may happen because knowledge in the various source schemas cannot be 
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completely reconciled in the global schema. In the formalization presented in 
the previous sections, this situation gives rises to an inconsistent DIS T> (i.e., 
sem{V) = 0), which cannot support query processing. 

A more general approach would be to provide a formalization that is able 
to support query processing even when the source schemas to be integrated are 
mutually incoherent. Here, we present a preliminary proposal aiming at this goal. 

The basic idea is that given a DIS T> = {Q,S,Aig^s) and a source model B 
for T>, we would like to focus our attention on those global interpretations X that 
are models of the global schema Q and that approximate as much as possible the 
satisfaction relation for the mapping M-g g. One way to formalize this idea is 
to distinguish between strict mappings, as the ones considered in Section 3, and 
loose mappings. In particular, we add to every pair in A4g,s a new item, which is 
either strict, or loose, and then we define an ordering wrt B between the models 
of Q. We concentrate directly on the most general case of unrestricted mapping. 

If Xi and X 2 are two models of we say that Ii is better than X 2 wrt 
B, denoted as Xi ^ 2 , iff for all triples {Vg,Vs,x) G -Mg,s, except for a 
distinguished one (VJ, VJ, loose), we have that = Vg^ and = Vf; 

while for the distinguished triple {Vg,V',x') we have that = Vf, 

and there exists a tuple t G Vf such that t G and t ^ ■ It is easy to 

verify that the relation is a partial order. With this notion in place we define 
global models for T> wrt B those models X of G that are maximal wrt i.e., 
for no other model X' of G, X' ^g X. 

5 Conclusions 

We have illustrated a logic-based framework for data integration, and we have 
discussed several choices for the specification of the mapping between the global 
schema and the source schemas. The form of such a specification greatly in- 
fluences the method for query answering. As we said before, most of the re- 
search work on data integration are based on first-order logic, following either 
the global-centric or the local-centric approach. However, it is our opinion that, 
in real world settings, the case of mutually inconsistent source databases will be 
very common. We have presented some preliminary ideas in order to extend the 
integration framework with suitable nonmonotonic reasoning features for dealing 
with this case, and we plan to study query processing strategies based on these 
ideas. 
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Abstract. Nonmonotonic logic programming (NMLP) and inductive 
logic programming (ILP) are two important extensions of logic program- 
ming. The former aims at representing incomplete knowledge and reason- 
ing with commonsense, while the latter targets the problem of inductive 
construction of a general theory from examples and background knowl- 
edge. NMLP and ILP thus have seemingly different motivations and 
goals, but they have much in common in the background of problems, 
and techniques developed in each field are related to one another. This 
paper presents techniques for combining these two fields of logic pro- 
gramming in the context of nonmonotonic inductive logic programming 
(NMILP). We review recent results and problems to realize NMILP. 



1 Introduction 

Representing knowledge in computational logic gives formal foundations of ar- 
tificial intelligence (AI) and provides computational methods for solving prob- 
lems. Logic programming supplies a powerful tool for representing declarative 
knowledge and computing logical inference. However, logic programming based 
on classical Horn logic is not sufficiently expressive for representing incomplete 
human knowledge, and is inadequate for characterizing nonmonotonic common- 
sense reasoning. Nonmonotonic logic programming (NMLP) [3,5] is introduced 
to overcome such limitations of Horn logic programming by extending the rep- 
resentation language and enhancing the inference mechanism. The purpose of 
NMLP is to represent incomplete knowledge and reason with commonsense in a 
program. 

On the other hand, machine learning concerns with the problem of building 
computer programs that automatically construct new knowledge and improve 
with experience [27]. The primary inference used in learning is induction which 
constructs general sentences from input examples. Inductive Logic Programming 
(ILP) [28,30,33] realizes inductive machine learning in logic programming, which 
provides a formal background to inductive learning and has advantages of us- 
ing computational tools developed in logic programming. The goal of ILP is the 
inductive construction of first-order clausal theories from examples and back- 
ground knowledge. 

NMLP and ILP thus have seemingly different motivations and goals, how- 
ever, they have much in common in the background of problems, and techniques 
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developed in each field are related to one another. First, the process of discov- 
ering new knowledge by humans is the iteration of hypotheses generation and 
revision, which is inherently nonmonotonic. Indeed, induction is nonmonotonic 
reasoning in the sense that once induced hypotheses might be changed by the 
introduction of new evidences. Second, induction problems assume background 
knowledge which is incomplete, otherwise there is no need to learn. Therefore, 
representing and reasoning with incomplete knowledge are vital issues in ILP. 
Third, NMLP uses hypotheses in the process of commonsense reasoning, and 
hypotheses generation is particularly important in abductive logic programming. 
Abduction generates hypotheses in a different manner from induction, but they 
are both inverse deduction and extend theories to account for evidences. In- 
deed, abduction and induction interact, and work complementarily in many 
phases [14]. Fourth, in NMLP updates of general rules are considered in the 
context of intentional knowledge base update [6], while a similar problem is cap- 
tured in ILP as concept-learning [26]. It is argued in [9] that these two researches 
handle the same problem when formulated in a logical framework. With these 
reasons, it is clear that both NMLP and ILP cope with similar problems and 
have close links to each other. 

Comparing NMLP and ILP, NMLP performs default reasoning and derives 
plausible conclusions from incomplete knowledge bases. Various types of infer- 
ences and semantics are introduced to extract intuitive conclusions from a pro- 
gram. NMLP may change conclusions by the introduction of new information, 
but it has no mechanism of learning new knowledge from the input. By con- 
trast, ILP extends a theory by constructing new rules from input examples and 
background knowledge. Discovered rules reveal hidden laws between examples 
and background knowledge, and are also used for predicting unseen phenom- 
ena. However, the present ILP mostly considers Horn logic programs or classical 
clausal programs as background knowledge, and has limited applications to non- 
monotonic situations. 

Thus, both NMLP and ILP have limitations in their present frameworks and 
complement each other. Since both commonsense reasoning and machine learn- 
ing are indispensable for realizing intelligent information systems, combining 
techniques of the two fields in the context of nonmonotonic inductive logic pro- 
gramming (NMILP) is meaningful and important. Such combination will extend 
the representation language on the ILP side, while it will introduce a learning 
mechanism to programs on the NMLP side. Moreover, linking different exten- 
sions of logic programming will strengthen the capability of logic programming 
as a knowledge representation tool in AI. From the practical viewpoint, the com- 
bination will be beneficial for ILP to use well-established techniques in NMLP, 
and will open new applications of NMLP. 

NMLP realizes nonmonotonic reasoning using negation as failure (NAF). 
Some researches in ILP, however, argue that negation as failure is inappropriate 
in machine learning. In [8], the authors say: 

For concept learning, negation as failure (and the underlying closed world 

assumption) is unacceptable because it acts as if everything is known. 
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Clearly, in learning this is not the case, since otherwise nothing ought to 
be learned. 

Although the account is plausible, it does not justify excluding NAF in ILP. 
Suppose that background knowledge is given as a Horn logic program, and the 
CWA or NAF infers negative facts which are not derived from the program. 
When a new evidence E which is initially assumed false under the CWA or NAF 
is observed, this just means that the old assumption ~^E is rebutted. The task of 
inductive learning is then to revise the old theory to explain the new evidence. On 
the other hand, if one excludes NAF in a background program, she loses the way 
of representing default negation in the program. This is a significant drawback in 
representing knowledge and restricts the application of ILP. In fact, NAF enables 
to write shorter and simpler programs and appears in many basic but practical 
Prolog programs such as computing set differences, finding union/intersection of 
two lists, etc [42]. Horn ILP precludes every program including these rules with 
NAF. Thus, NAF is also important in ILP, and the use of NAF never invalidates 
the need of learning. 

In the field of ILP, it is often considered the so-called nonmonotonic problem 
setting [18]. Given a background Horn logic program P and a set E of positive 
examples, it computes a hypothesis H which is satisfied in the least Her brand 
model of P U A. This is also called the weak setting of ILP [11]. In this setting, 
any fact which is not derived from P\J E \s assumed to be false under the closed 
world assumption (CWA). By contrast, the strong setting of ILP computes a 
hypothesis which, together with P, implies E, and does not imply negative 
examples. The strong setting is usually employed in ILP and is also considered 
in this paper (see Section 2.2).^ The nonmonotonic setting is called “nonmono- 
tonic” in the sense that it performs a kind of default reasoning based on the 
closed world assumption. Some systems take similar approaches using Clark’s 
completion ([10], for instance). The above mentioned nonmonotonic setting is 
clearly different from our problem setting. The former still considers an induc- 
tion problem within clausal logic, while we extend the problem to nonmonotonic 
logic programs. 

This paper presents techniques for realizing inductive machine learning in 
nonmonotonic logic programs. The paper is not intended to provide a compre- 
hensive survey of the state of the art, but mainly consists of recent research 
results of the author. The rest of this paper is organized as follows. Section 2 
reviews frameworks of NMLP and ILP. Section 3 presents various techniques for 
induction in nonmonotonic logic programs. Section 4 summarizes the paper and 
addresses open issues. 

^ The weak setting is also called descriptive/confirmatory induction, while the strong 
setting is called explanatory /predictive induction [15]. 
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2 Preliminaries 

2.1 Nonmonotonic Logic Programming 

Nonmonotonic logic programs considered in this paper are normal logic pro- 
grams, logic programs with negation as failure. 

A normal logic program (NLP) is a set of rules of the form: 

A .^1 7 • • ■ 7 7 not 1 , ■ ■ ■ 7 not Byi ( 1 ) 

where each A,Bi [\ < i < n) an atom and not presents negation as failure 
(NAF). The left-hand side of ^ is the head, and the right-hand side is the 
body of the rule. The conjunction in the body of (1) is identified with the set 
{ i?i, . . . , Bm, not Bm+i, ... 7 not Bn }. For a rule R, head{R) and body(R) denote 
the head of R and the body of R, respectively. The conjunction in the body is 
often written by the Greek letter B. A rule with the empty body A ^ is called a 
fact, which is identified with the atom A. A rule with the empty head <— F with 
F ^ 0 is also called an integrity constraint. Throughout the paper a program 
means a normal logic program unless stated otherwise. A program P is Horn 
if no rule in P contains NAF. A Horn program is definite if it contains no 
integrity constraint. The Herbrand base TLB of a program P is the set of all 
ground atoms in the language of P. Given the Herbrand base HB, we define 
TiB'^ = TiB U { not A \ A G HB}. Any element in HB^ is called an LP-literal, 
and an LP-literal of the form not A is called an NAF-literal. We say that two 
LP-literals Li and L 2 have the same sign if either (Li S HB and L 2 € HB) or 
(Li ^ HB and L 2 ^ HB). For an LP-literal L, pred{L) denotes the predicate 
in L and const(L) denotes the set of constants appearing in L. A program, a rule, 
or an LP-literal is ground if it contains no variable. A program/rule containing 
variables is semantically identified with its ground instantiation, i.e., the set 
of ground rules obtained from the program/rule by substituting variables with 
elements of the Herbrand universe in every possible way. 

An interpretation is a subset of HB. An interpretation / satisfies the ground 
rule R of the form (1) if {Hi, . . . , Bm} C I and {Bm+i, . . ■ , Bn} I = % imply 
A G I (written as / ^ i?). In particular, / satisfies the ground integrity con- 
straint ^ Hi, . . . , Bm, notBm+i, ■ ■ . 7 not Bn if either {Hi, . . . , Bm} \ / 7 ^ 0 or 
{Hm+i, ... 7 Bn} n/ yf 0. When a rule R contains variables, I \= R means that I 
satisfies every ground instance of R. An interpretation which satisfies every rule 
in a program is a model of the program. A model M of a program P is minimal 
if there is no model N of P such that N C M. A Horn logic program has at 
most one minimal model called the least model. 

For the semantics of NLPs, we consider the stable model semantics [17] in 
this paper. Given a program P and an interpretation M, the ground Horn logic 
program P^ is defined as follows: the rule A <— Hi, . . . ,Bm is in P^ iff there 
is a ground rule of the form (1) in the ground instantiation of P such that 
{Hm+i 7 . • ■ 7 Bn} n M = 0. If the least model of P^ is identical to M, M is called 
a stable model of P. A program may have none, one, or multiple stable models 
in general. A program having exactly one stable model is called categorical [3]. 
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A stable model coincides with the least model in a Horn logic program. A locally 
stratified program [36] has the unique stable model which is called the perfect 
model. Given a stable model M, we define M+ = M U { not A \ A S HB \M}. 

A program is consistent (under the stable model semantics) if it has a stable 
model; otherwise a program is inconsistent. Throughout the paper, a program 
is assumed to be consistent unless stated otherwise. If every stable model of a 
program P satisfies a rule R, it is written as P R. Else if no stable model of a 
program P satisfies a rule R, it is written as P [=s not R. In particular, P \=g A 
if a ground atom A is true in every stable model of P; and P [=s not A if A is 
false in every stable model of P. By contrast, if every model of P satisfies R, 
it is written as P \= R. Note that when P is Horn, the meaning of \= coincides 
with the classical entailment. 

2.2 Inductive Logic Programming 

A typical ILP problem is stated as follows. Given a logic program B represent- 
ing background knowledge and a set A+ of positive examples and a set E~ of 
negative examples, find hypotheses H satisfying^ 

1 . B U H \= e for every e € if + . 

2. B LI H ^ f for every / G E~ . 

3. B L H is consistent. 

The first condition is called completeness with respect to positive examples, 
and the second condition is called consistency with respect to negative examples. 
It is also implicitly assumed that B ^ e for some e G if'*' or H |= / for some 
/ G E~ , because otherwise there is no need to introduce El. A hypothesis El 
covers (resp. uncovers) an example eii BL H \= e (resp. BL H ^ e). 

The goal of ILP is then to develop an algorithm which efficiently computes hy- 
potheses satisfying the above three conditions. Induction algorithms are roughly 
classified into two categories by the direction of searching hypotheses. A top- 
down algorithm firstly generates a most general hypothesis and refines it by 
means of specialization, while a bottom-up algorithm searches hypotheses by 
generalizing (positive) examples. Each algorithm locally alternates search direc- 
tions from general to specific and vice versa to correct hypotheses. Algorithms 
presented in Sections 3. 1-3. 3 of this paper are bottom-up on this ground. 

An induction algorithm is correct if every hypothesis produced by the algo- 
rithm satisfies the above three conditions. By contrast, an induction algorithm is 
complete if it produces every rule satisfying the conditions. Note that the correct- 
ness is generally requested for algorithms, while the completeness is problematic 
in practice. For instance, consider the background program B and the positive 
example E such that 

B : r{f{x)) ^ r(x), 
g(a) r(b) ^ . 

E : p{a). 



^ When there is no negative example, A"*" is just written as E. 
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Then, any of the following rules 



p{x) ^ q{x), 
p{x) ^ q{x), r{b), 
p{x) ^ q{x), r{f{b)), 



explains p{a). Generally, there exist possibly infinite solutions for explaining an 
example, and designing a complete induction algorithm without any restriction 
is of little value in practice. In order to extract meaningful hypotheses, additional 
conditions are usually imposed on possible hypotheses to reduce the search space. 
Such a condition is called an induction bias and is defined as any information 
that syntactically or semantically influences learning processes. 

In the field of ILP, most studies consider a Horn logic program as background 
knowledge and induce Horn clauses as hypotheses. In this paper, we consider an 
NLP as background knowledge and induce hypothetical rules possibly containing 
NAF. In the next section, we give several algorithms which realize this. 

3 Induction in Nonmonotonic Logic Programs 

3.1 Least Generalization 

Generalization is a basic operation to perform induction. In his seminal work [34], 
Plotkin introduces generalization in clausal theories based on subsumption. Given 
two clauses Ci and C 2 , C\ 9-subsumes C 2 if Ci9 C C 2 for some substitution 9. 
Then, Ci is more general than C 2 under 9-subsumption if Ci 0-subsumes C 2 . 
In normal logic programs, a subsumption relation between rules is defined as 
follows. 

Definition 3.1. (subsumption relations between rules) Let i?i and R 2 be two 
rules. Then, i?i 9-subsumes R 2 (written as Ri R 2 ) if head{R\)9 = head{R 2 ) 
and body{Ri)9 C body{R 2 ) hold for some substitution 9. In this case, i?i is said 
more general than R 2 under 9 -subsumption. 

Thus subsumption is defined for comparison of rules with the same predi- 
cate in the heads. The same definition is employed by Taylor [43]. Fogel and 
Zaverucha [16] discuss the effect of subsumption to reduce the search space in 
normal logic programs. 

For generalization in clausal theories, least generalizations of clauses are par- 
ticularly important. The notion is defined for nonmonotonic rules as follows. 

Definition 3.2. (least generalization under subsumption) Let TZhe a finite set 
of rules such that every rule in TZ has the same predicate in the head. Then, a 
rule i? is a least generalization of TZ under 0-subsumption if R Ri for every 
rule Ri in TZ, and for any other rule R' satisfying R' >g Ri for every Ri in TZ, it 
holds that R' >g R. 
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In the clausal language every finite set of clauses has a least generalization. 
In particular, every finite set of Horn clauses has a least generalization as a Horn 
clause [33,34].^ When we consider normal logic programs, rules are syntactically 
regarded as Horn clauses by viewing NAF-literal notp{x) as an atom notjp{x) 
with the new predicate notjp. Then the result of Horn logic programs is directly 
carried over to normal logic programs. 

Theorem 3.1. (existence of a least generalization) LetTZ he a finite set of rules 
such that every rule in TZ has the same predicate in the head. Then, every non- 
empty set i? C 7?. has a least generalization under 9 -subsumption. 

A least generalization of two rules is computed as follows. First, a least gen- 
eralization of two terms /(ti,...,t„) and g{si, . . . , Sn) is a new variable v if 
f ^ g; and is defined as f{lg{ti,si),...,lg{tn,Sn)) if f = g, where lg{ti,Si) 
means a least generalization of ti and Si. Next, a least generalization of two LP- 
literals L\ = (not)p(ti, . . . , and L2 = {not)q{si, . . . , Sn) is undefined if Li 
and L2 do not have the same predicate and sign; otherwise, it is defined as 
lg{Li, L2) = {not)p{lg{ti,Si), ..., lg{tn, s„)). 

Then, a least generalization of two rules Ri = Ai ^ Ti and i?2 = A2 ^ T2, 
where Ai and A2 have the same predicate, is obtained as 

lg{Ai,A 2 ) ^ r 

where T = {^3(71, 72) | 71 G A, 72 G A and ^3(71, 72) is defined}. In partic- 
ular, if Ai and A2 are empty, a least generalization of two integrity constraints 
<— A and <— A is given by <— T. A least generalization of a finite set of rules is 
computed by repeatedly applying the above procedure. 

In ILP generalization is usually considered in relation to the background 
knowledge. Plotkin [35] extends subsumption to relative subsumption for this use. 
Given the background knowledge B as & clausal theory, a clause C subsumes D 
relative to B if there is a substitution 9 such that B ^ V(C'0 ^ D). 

We apply relative subsumption to normal logic programs. Let R = H <— A, T 
be a rule where A is an atom and T is a conjunction. Suppose that there is a 
rule A' ^ T' in a program P such that A 9 = A ! 9 for some substitution 9 . Then, 
we say that the rule (H <— P' , P )9 is obtained by unfolding R in P. We also say 
that Rk is obtained by unfolding Rq in P if there is a sequence Rq,. . . ,Rk of 
rules such that A (1 < * < fc) is obtained by unfolding A-i in P- 

Definition 3.3. (relative subsumption) Let P be an NLP, and Ri and R2 be 
two rules. Then, i?i 9 -subsumes R2 relative to P (written as i?i pf R2) if there 
is a rule R that is obtained by unfolding Ri in P and R 0-subsumes R2 . In this 
case, i?i is said more general than R2 relative to P under 9 -subsumption. 

The above definition reduces to Definition 3.1 when P is empty. By the 
definition relative subsumption is also defined for two rules having the same 

® If two clauses have no predicate with the same sign in common, the empty clause 
becomes the least generalization. 
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predicate in the heads. In clausal theories, Buntine [7] introduces generalized 
subsumption which is defined between definite clauses having the same predicate 
in the heads. Comparing two definitions, Buntine’s definition is model theoretic, 
while our definition is operational. Taylor [43] introduces normal subsumption 
which extends Buntine’s subsumption to normal logic programs and is defined 
in a model theoretic manner. 

Example 3.1. Suppose the background program P, and two rules Ri and i ?2 as 
follows. 



P : hasjwing{x) ^ bird{x), notab{x), 
bird{x) sparrow(x), 
ab(x) ^ broken-wing (x). 

R\ : flies{x) <— hasjwing(x). 

i ?2 : flies{x) ^ sparrow{x), full -grown{x), not ab{x). 

From P and i?i, the rule 

i ?3 : flies{x) <— sparrow{x),notab{x) 

is obtained by unfolding. As Rz 0-subsumes i? 2 , Ri hg R 2 - 

In clausal theories, a least generalization does not always exist under relative 
subsumption. However, when background knowledge is a finite set of ground 
atoms, a least generalization of two clauses is constructed [33,35]. The result 
is extended to nonmonotonic rules and is rephrased in our context as follows. 
Let P be a finite set of ground atoms, and Ri and i ?2 be two rules. Then, a 
least generalization of these rules under relative subsumption is constructed as 
a least generalization of R[ and i ?2 where head(R'f) = head{Ri) and body(R'f) = 
body{Ri) U B. 

Example 3.2. Suppose the background program P, and two (positive) exam- 
ples Pi and P 2 as follows. 

P : birdftweety) , bird{polly) <— . 

Pi : fliesftweety) <— hasjwingitweety), not abftweety) . 

P 2 : flies{polly) <— sparrow{polly) , not ab{polly) . 

Then, R[ and P 2 becomes 

R[ : flies{tweety) ^ bird{tweety), bird{polly), hasjwingltweety), 

not ab{tweety), 

R '2 ■ flies{polly) ^ bird{tweety) , bird{polly), sparrow{polly) , not ab{polly) . 
The least generalization of R[ and R '2 is 

flies{x) ^ bird{tweety), bird{polly), bird{x), notab{x). 
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Removing redundant literals, it becomes 

R : flies(x) <— bird{x), notab{x). 

In this case, it holds that P U {R} \=s Ri {i = 1 , 2 ). 

3.2 Inverse Resolution 

Inverse resolution [ 29 ] is based on the idea of inverting the resolution step be- 
tween clauses. There are two operators that carry out inverse resolution, ab- 
sorption and identification, which are called the V-operators together. Each 
operator builds one of the two parent clauses given the other parent clause 
and the resolvent. Suppose two rules Ri : Bi ^ Pi and R2 ■ A2 ^ B2,P2- 
When Bi 9 i = B202, the rule R3 : A2O2 ^ A^i, B2O2 is produced by unfold- 
ing i?2 with R\. Absorption constructs R2 from R\ and R3, while identification 
constructs R\ from R2 and R3 (see figure). 



R2-.A2^B2,P2 




Given a normal logic program P containing the rules R\ and R3, absorption 
produces the program A{P) such that 

A{P) = {P\{R 3 })U{R 2 }. 

On the other hand, given an NLP P containing the rules R2 and R3, identification 
produces the program I{P) such that 

/(P) = (P\{i? 3 })U{i?i}. 

Note that there are multiple A{P) or /(P) exist in general according to the 
choice of the input rules in P. We write V{P) to mean either A{P) or /(P). 

When P is a Horn logic program, any information implied by P is also implied 
by V{P), namely 

V{P)^P. 

In this regard, the V-operators generalize a Horn logic program. In the presence 
of negation as failure in a program, however, the V-operators do not work as 
generalization operations in general. 
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Example 3.3. Let P be the program: 

p{x) not q{x) , g(x) <— r(x), s{x) r{x), s(a) 

which has the stable model { p(a), s(a) }. Absorbing the third rule into the second 
rule produces A(P): 

p(x) <— not q(x), q(x) <— s(x), s(x) <— r(x), s(a) 

which has the stable model { q(a),s(a) }. Then, P j=s p(a) but A{P) p{a). 

A counter-example for identification is constructed in a similar manner. The 
reason is clear, since in nonmonotonic logic programs newly proven facts may 
block the derivation of other facts which are proven beforehand. As a result, the 
V-operators may not generalize the original program. Moreover, the next exam- 
ple shows that the V-operators often make a consistent program inconsistent. 

Example 3.4- Let P be the program: 

p{x) ^ q{x), notp{x), q{x) <— r{x), s(x) <— r(x), s(a) 

which has the stable model { s(a) }. Absorbing the third rule into the second 
rule produces A{P): 

p{x) ^ g(x), notp{x), q{x) ^ s{x), s(x) <— r(x), s(a) 
which has no stable model. 

The above example shows that the V-operators have destructive effect on 
the meaning of programs in general. It is also known that they may destroy the 
syntactic structure of programs such as acyclicity and local stratification [37]. 

These observations give us a caution to apply the V-operators to NMLP. A 
condition for the V-operators to generalize an NLP is as follows. 

Theorem 3.2. (conditions for the V-operators to generalize programs) [37] Let 
P be an NLP, and R\, i? 2 , R 3 he rules at the beginning of this section. For any 
NAF-literal not L in P,“^ 

(i) if L does not depend on the head of Rs in P, then P N implies A{P) ^g 
N for any N € TfS. 

(ii) if L does not depend on the atom B 2 of R 2 in P, then P \=g N implies 
I{P) ^g N for any N G HB. 

Here, depends on is a transitive relation defined as: A depends on B if there is a 
ground rule from P s.t. A appears in the head and B appears in the body of the 
rule. 
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Example 3.5. Suppose the background program P and a (positive) example E 
as follows. 



P : flies{x) <— sparrow{x), notab{x), 
bird{x) <— sparrow{x), 
sparrow{tweety) , bird{polly) <— . 

E : flies{polly). 

Initially, P fHes{tweety) but P flies{polly). Absorbing the second rule 
into the first rule in P produces the program A{P) in which the first rule of P 
is replaced by the next rule in A{P): 

flies{x) ^ bird{x), notah{x). 

Then, A{P) \=s flies{polly) . Notice that A{P) \=s flies{tweety) also holds. 

Taylor [43] introduces a different operator called normal absorption, which 
generalizes normal logic programs. 



3.3 Inverse Entailment 

Suppose an induction problem 



BU{H} \= E 

where B is a Horn logic program and H and E are each single Horn clauses. 
Inverse entailment (IE) [31] is based on the idea that a possible hypothesis H is 
deductively constructed from B and E by inverting the entailment relation as 

B U {^E/} ^ -^H. 

When a background theory is a nonmonotonic logic program, however, the IE 
technique cannot be used. This is because IE is based on the deduction theorem 
in first-order logic, but it is known that the deduction theorem does not hold in 
nonmonotonic logics in general [41]. 

To solve the problem, Sakama [38] introduced the entailment theorem in 
normal logic programs. A nested rule is defined as 

A^ R 

where A is an atom and i? is a rule of the form (1). An interpretation I satisfies a 
ground nested rule A ^ R'li I \= R implies A G I. For an NLP P, P \=s {A R) 
if A ^ i? is satisfied in every stable model of P. 

Theorem 3.3. (entailment theorem [38]) Let P be an NLP and R a rule such 
that P U {R} is consistent. Por any ground atom A, P U {i?} A implies 
P A ^ R. In converse, P ]=s A ^ R and P \=g R imply P U {R} ]=s A. 
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The entailment theorem corresponds to the deduction theorem and is used 
for inverting entailment in normal logic programs. 

Theorem 3.4. (IE in normal logic programs [38]) Let P be an NLP and R a 
rule such that PU{R} is consistent. For any ground LP-literal L, i/PU{i?} |=g L 
and P |=s <— L, then P |=g notR. 

Thus, the relation 

P |=s notR (2) 

provides a necessary condition for computing a rule R satisfying P U {i?} |=s L 
and P \=s^ L. When L is an atom (resp. NAF-literal), it represents a positive 
(resp. negative) example. The condition P \=s^ L states that the example L is 
initially false in every stable model of P. To simplify the problem, a program P 
is assumed to be function-free and categorical in the rest of this section. 

Given two ground LP-literals Li and L 2 , the relation Li ~ L 2 is defined if 
pred{Li) = pred{L 2 ) with a predicate of arity > 1 and const{L{) = const{L 2 ). 
Let L be a ground LP-literal and S a set of ground LP-literals. Then, Li in S' is 
relevant to L if either (i) Li ~ L or (ii) L\ shares a constant with an LP-literal L 2 
in S such that L 2 is relevant to L. 

Let P be a program with the unique stable model M and A a ground atom 
representing a positive example. Suppose that the relation P U {P} A and 
P \=s^ A hold. By Theorem 3.4, the relation (2) holds, thereby 

M^R. (3) 

Then, we start to find a rule R satisfying the condition (3). Consider the integrity 
constraint ^ P where P consists of ground LP-literals in M+ which are relevant 
to the positive example A.^ Since M does not satisfy this integrity constraint, 

M ^ ^ P (4) 

holds. That is, ^ P is a rule which satisfies the condition (3). 

Next, by P |=s <— A, it holds that A ^ M, thereby not A € M+. Since not A 
is relevant to A, the integrity constraint <— P contains not A in its body. Then, 
shifting the atom A to the head produces 

A^P' (5) 



where P' = P \ {not A}. 

Finally, the rule (5) is generalized by constructing a rule R* such that R*9 = 
A -f— P' for some substitution 9. It is verified that the rule R* satisfies the 
condition (2), i.e., P \=s notR*. 

The next theorem presents a sufficient condition for the correctness of R* to 
induce A. 

Since P is function-free, P consists of finite LP-literals. 
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Theorem 3.5. (correctness of the IE rule [39]) Let P be a function-free and 
categorical NLP, A a ground atom, and R* a rule obtained as above. If PU {R*} 
is consistent and pred{A) does not appear in P, then P U {i?*} A. 

Example 3.6. Let P be the program 

bird{x) ^ penguin{x), 
birdftweety) penguin{polly) ^ . 

Given the example L = fliesftweety) , it holds that P <— fliesftweety). Our 
goal is then to construct a rule R satisfying P U {i?} L. 

First, the set M+ of LP-literals becomes 

M~^={ birdftweety), bird{polly), penguin{polly) , 

not penguinftweety) , not fliesftweety), not flies{polly) }. 

From picking up LP-literals which are relevant to L, the integrity constraint: 

<— birdftweety), not penguinitweety) , not fliesftweety) 

is constructed. Next, shifting flies{tweety) to the head produces 

fliesftweety) <— birdftweety) , not penguinftweety) . 

Finally, replacing tweety by a variable x, the rule 

R* : flies(x) <— bird{x), not penguin(x) 

is obtained, where PU {R*} L holds. 

The inverse entailment algorithm is also used for learning programs by neg- 
ative examples [38]. 

3.4 Other Techniques 

This section reviews other techniques for learning nonmonotonic logic programs. 

Bain and Muggleton [2] introduce the algorithm called Closed World Spe- 
cialization (CWS). In the algorithm, an initial program and an intended inter- 
pretation that a learned program should satisfy are given. In this setting, any 
atom which is not included in the interpretation is considered false. For instance, 
suppose the program: 



P : flies(x) <— bird{x), 

bird(eagle) , bird{emu) , 

and the intended interpretation: 

M : { flies{eagle), bird{eagle) , bird{emu) }, 
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where flies{emu) is not in M and is interpreted false. As P implies flies{emu), 
the CWS algorithm specializes P and produces 

flies{x) <— bird{x), notab{x), 
bird{eagle) , bird(emu) , ab{emu) <— . 

Here, ab{x) is a newly introduced atom.® In this algorithm NAF is used for 
specializing Horn clauses and the CWS produces normal logic programs. 

Inoue and Kudoh [19] propose an algorithm called LELP which learns ex- 
tended logic programs (ELP) under the answer set semantics. The algorithm 
is close to Bain and Muggleton’s method but is different from it on the point 
that [19] uses Open World Specialization (OWS) rather than the CWS under the 
3- valued setting. The OWS does not use the closed world assumption to identify 
negative instances of the target concept. 

Given positive and negative examples, LELP firstly constructs (monotonic) 
rules that cover positive examples by using an ordinary ILP algorithm,^ then gen- 
erates default rules to uncover negative examples by incorporating NAF literals 
to the bodies of rules. In addition, exceptions to rules are identified from neg- 
ative examples and are then generalized to default cancellation rules. In LELP, 
hierarchical defaults can be learned by recursively calling the exception identifi- 
cation algorithm. Moreover, when some instances are possibly classified as both 
positive and negative, nondeterministic rules can also be learned so that there 
are multiple answer sets for the resulting program. Lamma etal. [22] formalize 
the same problem under the well-founded semantics. In their algorithms, differ- 
ent levels of generalization are strategically combined in order to learn solutions 
for positive and negative concepts. 

Dimopoulos and Kakas [12] construct default rules with exceptions. For in- 
stance, suppose the background program: 

P : bird{x) ^ penguin{x), 

penguin{x) super jpenguin{x) , 

bird{a) bird{b) 

penguin{c) , super jpenguin{d) , 

and the positive and negative examples: 

E~^ : flies{a), flies{b), flies(d). 

E~ : flies{c). 

Their algorithm first computes a rule which covers all the positive examples: 

ri : flies{x) <— bird{x) . 

® Such an atom is called invented. 

^ An “Ordinary ILP” means any top-down/bottom-up ILP algorithm which is used 
in clausal logic. 
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This rule also covers the negative example, then the algorithm next computes a 
rule which explains the negative example: 

7-2 : -^flies{x) <— penguin{x) . 

In order to avoid drawing contradictory conclusions on c, the rule T 2 is given 
priority over ri. Likewise, the algorithm next computes the rule 

7-3 : flies{x) <— super jpenguin{x) 

and T 3 is given priority over T 2 . A unique feature of their algorithm is that 
they learn rules using an ordinary ILP algorithm, and represent exceptions by a 
prioritized hierarchy without using NAF. 

Sakama [39] presents a method of computing inductive hypotheses using an- 
swer sets of extended logic programs. Given an ELP P and a ground literal L, 
suppose a rule R satisfying PU {R} [=as L, where |=as is an entailment under 
the answer set semantics. It is shown that this relation together with P L 
implies P '^as R- This provides a necessary condition for any possible hypoth- 
esis R which explains L. A candidate hypothesis is then obtained by computing 
answer sets of P, and constructing a rule which is unsatisfied in an answer set. 
The method provides the same result as [38] in a much simpler manner. In 
function-free stratified programs the algorithm constructs inductive hypotheses 
in polynomial-time. 

Bergadano et al. [4] propose the system called T RACY"'°^ which learns NLPs 
using the derivation information of examples. In this system candidate hypothe- 
ses are given in input to the system, and from those candidates the system 
selects hypotheses which cover/uncover positive/negative examples. Martin and 
Vrain [25] introduce an algorithm to learn NLPs under the 3-valued semantics. 
Given a 3- valued model of a background program, it constructs (possibly recur- 
sive) rules to explain examples. Seitzer [40] proposes a system called INDED. It 
consists of a deductive engine which computes stable models or the well-founded 
model of a background NLP, and an inductive engine which induces hypotheses 
using the computed models and positive/negative examples. It can learn un- 
stratified programs. Fogel and Zaverucha [16] propose an algorithm for learning 
strict and call-consistent NLPs, which effectively searches the hypotheses space 
using subsumption and iteratively constructed training examples. 

Finally, the algorithms presented in this paper are summarized in Table 1. 

For related research, learning abductive logic programs [13,20,21,23] and 
learning action theories [24] are important applications of NMILP. 

4 Summary and Open Issues 

We presented an overview of techniques for realizing induction in nonmonotonic 
logic programs. Techniques in ILP have been centered on clausal logic so far, 
especially on Horn logic. However, as nonmonotonic logic programs are different 
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Table 1. Comparison of Algorithms 



Learned Programs 


Algorithms 


References 


NLP 


Ordinary ILP -|- specialization 


[2] 


Selection from candidates 


[4] 


Top-down 


[16,25,40] 


Inverse resolution 


[37,43] 


Inverse entailment 


[38] 


Least generalization 


Section 3.1 


ELP 


Ordinary ILP 


[12] 


Ordinary ILP -|- specialization 


[19,22] 


Computing Answer Sets 





from classical logic, existing techniques are not directly applicable to nonmono- 
tonic situations. In contrast to clausal ILP, the field of nonmonotonic ILP is less 
explored and several issues remain open. Such issues include: 

- Generalization under implication: In Section 3.1, we introduced the sub- 
sumption order between rules and provided an algorithm of computing a least 
generalization, which is an easy extension of the one in clausal logic. On the other 
hand, in clausal theories there is another generalization based on the implica- 
tion order which uses the entailment relation Ci |= C 2 between two clauses Ci 
and C 2 . Concerning generalizations under implication in NMLP, however, the 
result of clausal logic is not directly applicable to NMLP. This is because the 
entailment relation in NMLP is considered under the commonsense semantics, 
which is different from the classical entailment relation. For instance, under the 
stable model semantics, the relation |=s is used instead of \=. Generality rela- 
tions under implication would have properties different from the subsumption 
order, and the existence of least generalizations and their computability are to 
be examined. 

- Generalization operations in nonmonotonic logic programs: In clausal the- 
ories, operations by inverting resolution generalize programs, but as presented 
in Section 3.2, they do not generalize programs in nonmonotonic situations in 
general. Then, it is important to develop program transformations which gen- 
eralize nonmonotonic logic programs (under particular semantics) in general. 
Such transformations would serve as fundamental operations in nonmonotonic 
ILP. An example of this kind of transformations is seen in [43]. 

- Relations between induction and other commonsense reasoning: Induc- 
tion is a kind of nonmonotonic inference, hence theoretical relations between 
induction and other nonmonotonic formalisms, including nonmonotonic logic 
programming, are of interest. Such relations will enable us to implement ILP in 
terms of NMLP, and also open possibilities to integrate induction and common- 
sense reasoning. Researches in this direction are found in [1,14]. 
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Ten years have passed since the first LPNMR conference was held in 1991. 
In [32] the preface says: 

. . . there has been growing interest in the relationship between logic pro- 
gramming semantics and non-monotonic reasoning. It is now reasonably 
clear that there is ample scope for each of these areas to contribute to 
the other. 

As a concluding remark, we rephrase the same sentence between NMLP and 
ILP. Combining NMLP and ILP in the framework of nonmonotonic inductive 
logic programming is an important step towards a better knowledge representa- 
tion tool, and will bring fruitful advance in each field. 
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Abstract. Logic programs P and Q are strongly equivalent if, given 
any logic program R, programs P U R and Q U R are equivalent (that 
is, have the same answer sets). Strong equivalence is convenient for the 
study of equivalent transformations of logic programs; one can prove 
that a local change is correct without considering the whole program. 
Recently, Lifschitz, Pearce and Valverde showed that Heyting’s logic of 
here-and-there can be used to characterize strong equivalence of logic 
programs. This paper offers a more direct characterization, and extends 
it to default logic. In their paper, Lifschitz, Pearce and Valverde study a 
very general form of logic programs, called “nested” programs. For the 
study of strong equivalence of default theories, it is convenient to intro- 
duce a corresponding “nested” version of default logic, which generalizes 
Reiter’s default logic. 



1 Introduction 



Logic programs P and Q are “strongly equivalent” if, given any logic pro- 
gram R, PU R and QU R are equivalent. Recent work by Lifschitz, Pearce 
and Valverde [4] uses Heyting’s logic of here-and-there to characterize strong 
equivalence of logic programs under the answer set semantics [1,3]. Their proof 
utilizes Pearce’s equilibrium logic [5,6]. In the current paper, strong equivalence 
of logic programs is characterized more directly, in terms of concepts used in 
the definition of answer sets — no knowledge of the logic of here-and-there is re- 
quired. This simplifies the proof of the main strong equivalence theorem, and 
may also make the result easier to apply to specific cases. Moreover, this alterna- 
tive characterization of strong equivalence is easily extended to Rieter’s default 
logic [7]. 

Strong equivalence can help us reason about correctness of logic programs 
and default theories. For example, as discussed in [4], it can be used to establish 
the fact that in any logic program with a constraint of the form 

±^F,G, 



the disjunctive rule 



F;G^T 
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can be replaced by the pair of rules 

F <— not G 
G <— not F 

without affecting the program’s answer sets. 

Lifschitz, Pearce and Valverde consider strong equivalence for a very general 
form of logic programs, called “nested” programs, introduced by Lifschitz, Tang 
and Turner [3]. For the study of strong equivalence of default theories, it is 
convenient to introduce similarly general “nested” default theories. 

Section 2 reviews definitions for nested logic programming. Section 3 states 
and proves a simple characterization of strong equivalence for logic programs. 
Section 4 makes precise the relationship between our strong equivalence theorem 
and that obtained using the logic of here-and-there. Section 5 briefly investigates 
strongly equivalent transformations of logic programs. Taking advantage of the 
strong similarities between definitions for logic programming and default logic. 
Section 6 introduces “nested” default logic, and shows that it extends both 
nested logic programming and disjunctive default logic [2] , which in turn extends 
Reiter’s default logic. Section 7 states a characterization of strong equivalence 
for nested default theories similar to that for nested logic programs. Section 8 
briefly investigates strongly equivalent transformations of default theories. Proofs 
related to nested default logic appear in Section 9. 

2 Nested Logic Programming 

This paper employs the definition of logic programs from [3] , although the pre- 
sentation differs in some details. 

2.1 Syntax 

The words atom and literal are understood here as in propositional logic. El- 
ementary formulas are literals and the 0-place connectives T (“false”) and 
T ( “true” ) . NLP formulas are built from elementary formulas using the unary 
connective not and the binary connectives , (conjunction) and ; (disjunction). 
An NLP rule is an expression of the form 

F^G 

where F and G are NLP formulas, called the head and the body of the rule. 

A nested logic program is a set of NLP rules. 

When convenient, a rule A <— T is identified with the formula F. 

2.2 Semantics 

Let us first define recursively when a consistent set X of literals satisfies an NLP 
formula F (symbolically, A ^ F), as follows. 
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- For elementary F,X\=FiSFdXorF = T. 

- X ^ (F,G) iS X ^ F and X ^ G . 

- X ^{F;G) iS X ^ F or X ^ G . 

- X \= not F m X ^ F . 

A consistent set X of literals is closed under a program P if, for every rule 
F ^ G ii\ P, X \= F whenever X \= G. 

The reduct of a formula F relative to a consistent set X of literals (writ- 
ten F^) is obtained by replacing every maximal occurrence in A of a formula 
of the form not G with _L if A |= G and with T otherwise. The reduct of a 
program P relative to X (written P^) is obtained by replacing the head and 
body of each rule in P by their reducts relative to X. 

A consistent set X of literals is an answer set for P if it is minimal among 
the consistent sets of literals closed under P^ . 

As discussed in [3] , this definition agrees with previous versions of the answer 
set semantics on consistent answer sets (but does not allow for an inconsistent 
one). 



3 Strong Equivalence of Logic Programs 

Logic programs P and Q are equivalent if they have the same answer sets. They 
are strongly equivalent if, for any logic program R, PUR and QUR are equivalent. 

The following terminology is convenient. For program P, and consistent 
sets X, Y of literals with X CY , call the pair (A, Y) an SE-model of P if both A 
and Y are closed under P^. 

In Section 4, we will see that SE-models correspond to models in the logic of 
here-and-there. 

Theorem 1. Logic programs are strongly equivalent iff they have the same 
SE-models. 

Proof. Right to left: Assume that programs P and Q have the same SE-models. 
Take any program R. We need to show that PUR and QU R are equivalent. 
Assume that A is an answer set for PUR. That is, A is a consistent set of literals 
closed under (P U P)^, and no proper subset of A is closed under (P U R)^ . 
Since (P U P)^ = P^ U P^, A is closed under both P^ and P^. Since A is 
closed under P^, it follows by assumption that A is closed under . So A 
is closed under U R^ = {Q U R)^ . Suppose a proper subset of A is closed 
under (Q U P)^. Then it is closed under both and P^. By assumption it is 
also closed under P'^, and thus under (P U P)^, contradicting the choice of A. 
We conclude that every answer set for P U P is an answer set for Q U P. By 
symmetry, every answer set for Q U P is an answer set for PUP. 

Left to right: Assume (wlog) that (A, Y) is an SE-model of program P but 
not of program Q. We need to show that P and Q are not strongly equivalent. 
Consider two cases. 



84 



Hudson Turner 



Case 1: y is not closed under Then V is not closed under (Q U Y)^ = 
Q'^ U Y, and so is not an answer set for QUY. On the other hand, one easily 
verifies that Y is an answer set for PUY. Hence P and Q are not strongly 
equivalent. 

Case 2: Y is closed under Q'^ . Take R = X U {F ^ G : F,G GY \ X}. 
Clearly Y is closed under {Q U R)'^ . Let Z be a subset of Y closed un- 
der {QU R)'^ = U R. By choice of R, X C Z, and by assumption X is not 
closed under , so X ^ Z. Hence some L G Y \ X belongs to Z. It follows by 
choice of R that Y \ X C Z. Consequently Z = Y, and so Y is an answer set 
for QU R. On the other hand, X is a proper subset of Y that is closed under 
(PU i?)^ = P^ UP. So y is not an answer set for PUP, and we conclude again 
that P and Q are not strongly equivalent. □ 

Although simpler (due to simpler definitions), this proof resembles in many 
details the proof of the main theorem in [4], including the fact that it demon- 
strates that if logic programs P and Q are not strongly equivalent then they can 
be distinguished by adding a logic program in which the head of each rule is a 
literal and the body of each rule is either a literal or T. 

4 HT-Models and the Logic of Here-and-There 

Lifschitz, Pearce and Valverde identify logic program rules with formulas in the 
logic of here-and-there, and show that programs are strongly equivalent iff they 
are equivalent in the logic of here-and-there. 

They consider nested programs, as described in Section 2, except that they 
do not allow classical negation. (That is, their programs do not contain the 
symbol ^.) Accordingly, they define answer sets using sets of atoms in place of 
consistent sets of literals. For convenience, the term “stable model” will be used 
to refer to an answer set in their sense. 

After establishing their strong equivalence theorem (with respect to stable 
models) for nested programs without classical negation, they explain that the 
result can be extended to all nested programs as follows. Take any nested pro- 
gram P. For each atom A in the language of P, add a new atom A', and let P' be 
the program in this extended language obtained by (i) replacing each occurrence 
of each negative literal with atom A' , and (ii) adding the rule Y ^ A, A' for 
every new atom A' . The answer sets for P are in one-to-one correspondence with 
the stable models of P' . More precisely, given any set X of literals, let X' be 
obtained by replacing each negative literal ~^A G X \yy A' . Then X is an answer 
set for P iff X' is a stable model of P'. 

It follows that nested programs P and Q are strongly equivalent (in the 
sense of this paper) iff P' and Q' are strongly equivalent wrt stable models. 
Moreover, for any nested programs P and Q without classical negation, P and Q 
are strongly equivalent wrt stable models iff P' and Q' are. 

In [4], an HT-interpretation is a pair (/^, /^) of sets of atoms, with C 
Without going into details, we can observe that they define when an HT-inter- 
pretation is a model of a logic program in the sense of the logic of here-and-there. 
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Although it is not done here, one can easily verify that their Lemmas 1 and 2 
together imply the following. 

Proposition 1. For any nested logie program P, (X,Y) is an SE-model of P 
iff {X' ,Y') is a model of P' in the logic of here- and- there. 

So these approaches are essentially equivalent with regard to logic programs. 
Each has advantages. 

The primary advantage of the approach introduced here is its relative sim- 
plicity. The definition of SE-model is quite straightforward, based on concepts 
already introduced in the definition of answer sets. This in turn simplifies the 
proof of the strong equivalence theorem. Moreover, the (relatively) simple defi- 
nition can make the theorem easier to apply to specific cases. 

The definition introduced in this paper takes advantage of the special status 
of the symbol <— in definitions of logic programming. By comparison, the logic 
of here-and-there treats <— as just another connective, and even defines not in 
terms of it — not F is understood as an abbreviation for T <— E. The possibility 
of nested occurrences of <— complicates the truth definition considerably. 

It is important to note, though, that this complication takes a familiar form — 
the truth definition in the logic of here-and-there uses standard Kripke models. 
In fact, they are a special case of Kripke models for intuitionistic logic (which is, 
accordingly, slightly weaker). Thus, such an approach brings with it a range of 
associations that may help clarify intuitions about the meaning of connectives <— 
and not in logic programming. 

Even if we consider only convenience in the study of strong equivalence (or 
similar properties), the logic of here-and-there offers a potential advantage: it is 
a logic with known identities, deduction rules, and such, which can be used to 
establish strong equivalence in particular cases. 

Nonetheless, when we wish to apply strong equivalence results, it seems likely 
that a model-theoretic argument using the definition from this paper will often 
be easier than a proof-theoretic argument using known properties of the logic of 
here-and-there. 

5 Equivalent Transformations of Logic Programs 

To demonstrate the use of Theorem 1, let us consider again the example from 
the introduction: for any NLP formulas F and G, programs P\ and P 2 below 
have the same SE-models. 

F;G F ^ not G 

T ^ E, G G^ not F 

T ^ E,G 

To see this, take any pair {X,Y) of consistent sets of literals such that X CY, 
and consider four cases. 

Case 1: Y \= (F,G)'^. Then Y is not closed under either of Pf^ or Pff , so 

{X,Y) is not an SE-model of Ei or P 2 . 
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Case 2: Y 1= (F,notG)^. So Y is closed under both and Notice 
that not does not occur in . It follows that since Y ^ G^ and X CY, 
X ^ G^ . We can conclude that X is closed under iff X |= iff X is 
closed under P 2 ■ So {X, Y) is an SE-model of Pi iff it is an SE-model of P2. 
Case 3: y ^ {not F, G)^ . Symmetric to previous case. 

Case 4: y ^ {not F, not G)^ . Similar to first case. 

When strong equivalence is characterized using the logic of here-and-there, 
we immediately obtain a replacement theorem: strong equivalence is preserved 
under substitution of formulas that are equivalent in the logic of here-and-there. 
And of course it follows that if formulas P and G are satisfied by the same (here- 
and-there) models of a program P, then, for any program Q, occurrences of P 
in Q can be replaced by G without affecting the answer sets of PU Q. One can 
provide a similar facility using SE-models. Let us begin with two definitions. 

We say that NLP formulas P and G are equivalent relative to logic program P 
if, for every SE-model {X, Y) of P, A ^ F^' iff A ^ 

An occurrence of a formula is regular if it is not an atom preceded by 

Theorem 2. Let P he a program, and let F and G be formulas equivalent rela- 
tive to P. For any program Q, and any program Q' obtained from Q by replacing 
regular occurrences of F by G, programs P U Q and P U Q' are strongly equiva- 
lent. 

The restriction to regular occurrences is essential. For example, formulas p 
and q are equivalent relative to program P3 = {p <? <— p}, yet programs 
P3 U {^p} and P3 U {^q} are not strongly equivalent. 

Theorem 2 is a more widely-applicable version of Proposition 3 from [3]. 
There we defined equivalence of formulas more strictly, and did not make it 
relative to a program. We also used a notion of “equivalence” of programs 
stronger than strong equivalence. Although it is not done here, a proof of The- 
orem 2 can be easily constructed based on the corresponding proof from the 
earlier paper. (Section 9 does include a similar proof — of the corresponding the- 
orem for “nested” default logic.) Alternatively, just as Proposition 1 related the 
SE-models (A, Y) of a program P with the models (A', Y') of program P' un- 
der the logic of here-and-there, one can show that NLP formulas P and G are 
equivalent relative to program P iff the corresponding formulas F' and G' are 
satisfied by the same models of P' in the logic of here-and-there. 

Many formula equivalences are proved in [3] (Proposition 4), and of course 
they also hold under our (weaker) definition (relative to the empty program). 
Thus, Theorem 2 implies, for instance, that replacing subformulas of the form 
not (P, G) with not P; not G yields a strongly equivalent program. 

For another example using Theorem 2, observe that for any program Q, 
and any program Q' obtained from Q by replacing occurrences of not P by G 
and/or not G by P, programs P2 U Q and P2 U Q' are strongly equivalent. 
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6 Nested Default Logic 

For the study of strong equivalence of default theories, it is convenient to in- 
troduce a “nested” version of default logic that generalizes disjunctive default 
logic [2], which in turn generalizes Reiter’s default logic. 

The relatively uniform syntax of nested default logic will make it more con- 
venient for stating and using strong equivalence results. (We don’t have to deal 
separately with a prerequisite and a set of justifications — they are expressed in 
a single formula.) 

As one might expect, the definitions for nested default logic are almost exactly 
as for nested logic programs — essentially, allow arbitrary formulas of classical 
logic in place of literals, and use consistent, logically closed sets of formulas in 
place of consistent sets of literals. Accordingly, the strong equivalence theorem 
(and its proof!) is nearly identical too. 

6.1 Syntax 

Let us say classical formula to mean a formula of classical propositional logic. 

NDL formulas are built from classical formulas using the unary connec- 
tive not (negation as failure) and the binary connectives | (strong disjunction) 
and A (conjunction). (There is no need for a distinct “strong conjunction” con- 
nective.) An NDL rule is an expression of the form 

F 

~G 

where F and G are NDL formulas, called the condition and the conclusion of 
the rule. 

A nested default theory is a set of NDL rules. 

When convenient, a rule of the form will be identified with formula F. 



6.2 Semantics 

Let us use the term candidate set for a consistent set of classical formulas that 
is closed under classical propositional logic. 

We can recursively define when a candidate set X satisfies an NDL formula F 
(symbolically, X \= F), as follows. 

~ For classical F,X\=FiSFgX. 

- X^{F A G)iSX^F andX^G. 

- X \= {F \ G) iS X \= F or X \= G . 

- X \=notFiS X ^ F. 

A candidate set X is closed under a default theory P if, for every rule -q 
in P, A 1= F implies X \= G. 

The reduct of an NDL formula F relative to a candidate set X (written F^) 
is obtained by replacing every maximal occurrence in F of a formula of the 
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form not G with J- if X \= G and with T otherwise. The reduct of a default 
theory P relative to X (written P^) is obtained by replacing the condition and 
conclusion of each rule in P by their reducts relative to X. 

A candidate set X is an extension of P if it is minimal among the candidate 
sets closed under P^ . 



6.3 Relation to (Nested) Logic Programming 

Essentially, nested logic programming is a special case of nested default logic. 
Every NLP formula F corresponds to the NDL formula d{F) obtained by re- 
placing occurrences of the connectives ; and , with | and A respectively. A nested 
logic program corresponds to the default theory obtained by replacing each NLP 

rule F ^ G with . A consistent set of literals corresponds to the candidate 
set whose formulas are its consequences (in classical logic). 

Proposition 2. The answer sets for any nested logic program correspond to the 
extensions of the corresponding nested default theory. 



6.4 Relation to (Disjunctive) Default Logic 

Nested default logic generalizes disjunctive default logic [2], which in turn gener- 
alizes Reiter’s default logic. Here we review the definition of disjunctive default 
logic and relate it to nested default logic. 

A disjunctive default rule is an expression of the form 

Oi Pi, , Pm . .. N 

7l|---|7n ^ ^ 



where a, /3i, 7 i, 7 n are classical formulas (m > 0, n > 1). Reiter’s 

default logic corresponds to the special case when n = 1.^ 

A disjunctive default rule (1) corresponds to the NDL rule 

a A not -'Pi A • • • A not ~^Pm 
7 l I • ■ ■ l 7 n 

A disjunctive default theory is a set of disjunctive default rules. 

Let P be a disjunctive default theory and X a set of classical formulas. Define 



P^ = 



a : 



7 i| 



\ln 



Ot . Pi , ... , 
7 l| •• • \ln 



€ P and ->/3i, 



-Pm^X 



A set Y of classical formulas is closed under P^ if, for every member of P^ , if 
a GY then at least one of 71 , . . . , belongs to Y. 

We say X is an extension of P if A is minimal among sets of formulas closed 
under propositional logic and closed under P^. 



^ In Reiter’s formulation, a default theory is a pair (P, W), where the second compo- 
nent IT is a set of classical formulas. Here we suppress the second component, since 
every p GW can be equivalently represented in P by the rule . 
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Proposition 3. A candidate set X is an extension of a disjunctive default the- 
ory P iff it is an extension of the corresponding nested default theory. 

Proposition 3 restricts attention to candidate sets (which are by definition 
consistent) because, unlike nested default logic, disjunctive default logic allows 
for the possibility of an inconsistent extension. 



7 Strong Equivalence of Default Theories 

Nested default theories P and Q are equivalent if they have the same extensions. 
They are strongly equivalent if, for any nested default theory R, P U R and Q U R 
are equivalent. 

For nested default theory P, and candidate sets X, Y with X CY, the 
pair {X,Y) is an SE-model of P if both X and Y are closed under P^ . 

Theorem 3. Nested default theories are strongly equivalent iff they have the 
same SE-models. 

A proof of Theorem 3 is easily obtained from the proof of Theorem 1, and 
so is not presented in this paper. (Essentially, replace references to “consistent 
sets of literals” with references to “candidate sets.”) 

The proof shows that any two nested default theories that are not strongly 
equivalent can be distinguished by adding a nested default theory in which the 
conditions and conclusions of all rules are classical formulas. 



8 Equivalent Transformations of Default Theories 

As with logic programs (using Theorem 1), Theorem 3 can be used, for example, 
to show that in any default theory containing the rule 

FAG 

the rule 

T 

F I G 

can be replaced by the rules 

not F notG 
G ’ F ' 

Moreover, it is clear that replacing any occurrence of one classical formula 
with another that is logically equivalent (in classical logic) yields a strongly 
equivalent default theory. 

We can formulate an additional replacement theorem, similar to Theorem 2 
for logic programming, thus extending our account of when an occurrence of one 
formula may be safely replaced by another. Again we need some definitions first. 
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We say that NDL formulas F and G are equivalent relative to nested default 
theory P if, for every SE-model (X, Y) of P, X \= F'^ iS X \= . 

An occurrence of a subformula in an NDL formula is called regular if it is 
not a proper subpart of an occurrence of a subformula formed by an application 
of ^ or V. 

Theorem 4. Let P be a nested default theory, and let F and G he formulas 
equivalent relative to P. For any nested default theory Q, and any nested default 
theory Q' obtained from Q by replacing some regular occurrences of F by G, 
nested default theories P U Q ond P \J Q' are strongly equivalent. 

As with Theorem 2, the restriction to regular occurrences is essential. (And 
essentially the same example shows this.) 

Theorem 4 can be used to show, for example, that in any nested default 
theory with rules 

not F not ~^F 
~^F ’ F 

any occurrence of NDL formula F\G (for any classical formula G) can be safely 
replaced with F \/ G. 

9 Proofs Related to Nested Default Logic 

Proposition 2. The answer sets for any nested logic program correspond to the 
extensions of the corresponding nested default theory. 

For any candidate set X, let 1{X) denote the set of all literals in X. 

Lemma 1. For any NLP formula F and candidate set X , 1{X) \= F iff 
X^d{F). 

Proof. Straightforward, by structural induction. □ 

Lemma 2. For any NLP formula F and candidate set X , d{F^^^^) = d{F)^ . 

Proof. Follows easily from Lemma 1 and the definitions. □ 

Lemma 3. For any nested logic program P and candidate sets X and Y , 1{X) is 
closed under iff ^ is closed under d{P)^ . 

Proof. Follows easily from Lemmas 1 and 2, and the definitions. □ 

Proof of Proposition 2: Take any nested logic program P. Assume X is an 
answer set for P. So A is a consistent set of literals closed under P^ , and no 
proper subset of X is closed under P^ . Let Y be the candidate set corresponding 
to X. By Lemma 3, Y is closed under d{P)^ . Suppose a candidate set Z with 
Z C y is closed under d{P)^ . By Lemma 3, 1{Z) is closed under P^. Since 
Z GY , 1{Z) C X. We conclude by choice of X that 1{Z) = X. It follows that 
Z = y. So y is an extension of d{P). Proof in the other direction is similar. □ 
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Proposition 3. A candidate set X is an extension of a disjunctive default the- 
ory P iff it is an extension of the corresponding nested default theory. 

Proof. It is clear that for any disjunctive default theory P and corresponding 
nested default theory Q, and any candidate sets X and F, X is closed under P'^ 
iff X is closed under from which the result follows. □ 

Theorem 4. Let P be a nested default theory, and let F and G he formulas 
equivalent relative to P. For any nested default theory Q, and any nested default 
theory Q' obtained from Q by replacing some regular oceurrences of F by G, 
nested default theories P U Q ond P \J Q' are strongly equivalent. 

The proof of Theorem 4 is very similar to the proof of Proposition 3 from [3] , 
and illustrates how a proof of Theorem 2 might go. 

We begin with an easily verified lemma. 

Lemma 4. For any NDL formula F and candidate set X , X \= F ijf X \= F^ . 

Lemma 5. Let F and G be NDL formulas equivalent relative to nested default 
theory P. Lf an NDL formula H' can he obtained from an NDL formula H by 
replacing some regular oecurrences of F by G, then H and H' are equivalent 
relative to P. 



Proof. Consider any SE-model {X, Y) of P. We need to show that X ^ iff 
X ^ Proof is by structural induction on FI . 

Case 1: iL is an atom or H = ^H\ or H = i?i V i? 2 - Then the only regular 
occurrence of a formula in H is H itself. Consequently H = F and H' = G, and 
we’re done. 

Case 2: H = Hi A H 2 . li H = F and H' = G we’re done. Otherwise, 
H' = H'l A H '2 and, by the induction hypothesis, Hi and H'l are equivalent 
relative to P, as are H 2 and H!^. Then 



X'^H^ iff X h A H 2 Y 
iff X h Hf A Hf 
iff X ^ HY and X ^ HY 
iff X h {H[Y and X ^ 
iff X^{H[Y A{H'2Y 
iff X h {H'l A H'^Y 
iff X h {H'Y- 



Case 3: H = Hi\H 2 . Similar to Case 2. 

Case A. H = not Hi. If H = F and H' = G we’re done. Otherwise, 
H' = not H'l and, by the induction hypothesis. Hi and H'l are equivalent relative 
to P. Assume that X |= {not HiY ■ Then {not HiY = T, so F ^ Hi. It follows 
by Lemma 4 that F ^ HY . Since (X, F) is an SE-model of P, so is (F, F). 
Since Hi and H'^ are equivalent relative to P, we can conclude that F ^ {H'fY . 
By Lemma 4, F ^ iJ(. So {not H'fY = T, and thus X^ {not FY ■ The other 
direction is symmetric. □ 
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Proof of Theorem J^.: Assume that Q' can be obtained from Q by replacing 
some regular occurrences of F by a formula G that is equivalent relative to P . 
We must show that P U Q and Py^Q' have the same SE-models. 

Consider any SE-model {X,Y) of P. It is enough to show that both X 
and Y are closed under iff both are closed under (Q')^. So consider any rule 
H 

-jT^ S <3, along with the corresponding rule € Q' . By Lemma 5, X \= 

-112 ii2 

iff A [= and similarly X \= (^ 2 )^ iff A ^ We conclude that A is 

closed under iff it is closed under {Q')^ ■ Since {Y,Y) is also an SE-model 
of P, the same argument can be used to show that Y is closed under iff it is 
closed under C 
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Abstract. In this paper, the expressive power of disjunctive rules in- 
volving default negation is analyzed within a framework based on polyno- 
mial, faithful and modular (PFM) translations. The analysis is restricted 
to the stable semantics of disjunctive logic programs. A particular inter- 
est is understanding what is the effect if default negation is allowed in the 
heads of disjunctive rules. It is established in the paper that occurrences 
of default negation can be removed from the heads of rules using a PFM 
translation when default negation is allowed in the bodies of rules. In 
this case, we may conclude that default negation appearing in the heads 
of rules does not affect expressive power of rules. However, in the case 
that default negation may not be used in the bodies of rules, such a PFM 
translation is no longer possible. Moreover, there is no PFM translation 
for removing default negation from the bodies of rules. Consequently, 
disjunctive logic programs with default negation in the bodies of rules 
are strictly more expressive than those without. 



1 Introduction 

Logic programming with answer sets [6,7] as proposed by Gelfond and Lifschitz 
has been recently recognized as a logic programming paradigm of its own [22,23]. 
This is mainly because problems from many domains such as planning [18], con- 
figuration [30] and verification [9] have attractive formulations as logic programs 
under the answer set semantics [6]. Much of the promise of the paradigm is also 
due to efficient implementations [17,24] that currently allow computing answer 
sets for logic programs with thousands of rules. Being able to handle programs of 
this scale has already turned out to be sufficient to enable industrial applications 
of the answer set programming approach. 

Our interest in answer set programming is comparing the expressive powers 
of various types of rules that have been introduced by the logic programming 
community. This paper can be viewed as a continuation of previous work on 

* A preliminary version of this paper was presented at the 5th Dutch-German 
Workshop on Nonmonotonic Reasoning Techniques and their Applications 
(DGNMR’Ol). 

** Support from Academy of Finland (project 43963) is acknowledged with gratitude. 



T. Eiter, W. Faber, and M. Truszczynski (Eds.): LPNMR 2001, LNAI 2173, pp. 93—106, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 



94 



Tomi Janhunen 



the expressive power of non-monotonic logics [8,10,12,13,14]. The author [15] 
extends similar techniques for some syntactically restricted classes of logic pro- 
grams. The analysis is based on the existence of polynomial, faithful and modular 
(PFM) translation functions between classes. This gives rise to a hierarchy of 
classes of logic program ordered by expressive power. However, the results pre- 
sented in [15] are limited to very special subclasses of normal logic programs, 
since the goal is studying how the number of positive body literals affects the 
expressiveness of rules. In this paper, more general classes of logic programs in- 
volving disjunction are taken into consideration. The semantics of programs in 
these classes is determined by respective generalizations [7,19] of the answer set 
semantics [6]. 

Historically speaking, the answer set semantics has its roots in the stable 
model semantics [5] of normal logic programs (also known as general logic pro- 
grams [20]). This class is obtained from ordinary logic programs (that consist of 
rules that are effectively Horn clauses) by allowing the use of a form of negation 
- negation as failure to prove [20] - in the bodies of rules. Due to close inter- 
connections to Reiter’s default logic [28], this form of negation is also known 
as default negation. Default negation differs from classical negation and it is 
therefore quite natural that Gelfond and Lifschitz proposed a logic program- 
ming approach with both negations [6]. This is how the answer set semantics 
originated as a generalization of the stable model semantics. Later on, Gelfond 
and Lifschitz extended the answer set semantics to cover disjunctive logic pro- 
grams with classical negation [7] (Przymusinski [26] presented similar ideas, but 
in a more general setting). The latest generalization [19,18] to answer set pro- 
gramming allows occurrences default negation in the heads of disjunctive rules 
as well. 

In this paper, we restrict ourselves to the class of disjunctive logic programs 
without classical negation and use PFM translation functions to evaluate the 
effects of extending the rule language with default negation (i) in the bodies 
of rules, (ii) in the heads of rules, and (iii) in both. The rest of the paper is 
organized as follows. Section 2 gives a brief introduction to disjunctive logic 
programs and the stable model semantics. In Section 3, we present the analysis 
method based on PFM translation functions. The method is then applied in 
Section 4 to evaluate the effects of default negation on the expressiveness of 
disjunctive rules. After that some comparisons with related work are performed 
in Section 5. Finally, the paper ends with a discussion in Section 6. 

2 Disjunctive Logic Programs 

In this paper, we consider disjunctive logic programs in the propositional case^. 
We let ~ stand for default negation in order to distinguish it from classical 
negation Given a (propositional) atom a, we define positive and negative 

^ Disjunctive programs with variables are also covered through Herbrand instantiation. 
In the presence of function symbols, Herbrand instantiation produces an infinite (but 
countable) propositional program out of a finite disjunctive program with variables. 
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literals as expressions of the forms a and ~a, respectively. To handle sets of 
negative literals nicely, we define = {~a \ a G A} for a set of atoms A. In 

general, a disjunctive logic program P is a set of rules of the form 

A\J ^ C (1) 

where A, B, C and D are sets of atoms. The literals in A U form the head 
of the rule while the literals in C U form the body of the rule. The intuition 

behind a rule of the form (1) is that if all the atoms in C can be inferred and 

none of the atoms in D can be inferred, then one of the atoms in A can be 
inferred or one of the atoms in B cannot be inferred. This is how the head of 
the rule is interpreted disjunctively while the body is subject to a conjunctive 
interpretation^. The Herbrand base Hb(P) of a disjunctive logic program P is 
the set of atoms that appear in P. The class of all disjunctive logic programs is 
denoted hy T>. A disjunctive logic program P is positive if all rules (1) of P satisfy 
5 = 0 and 5 = 0. Quite similarly, a program P is head-positive (alternatively 
body-positive), if all rules (1) of P satisfy 5 = 0 (alternatively 5 = 0). The 
respective classes of disjunctive logic programs are denoted by 5+, 5^+, and 

These definitions imply that V'^ C C V and 5+ C C T>. 

2.1 Stable Models and Answer Sets 

Because this paper is restricted to classes of disjunctive logic programs without 
classical negation, the forthcoming definition of stable models coincides with 
that of answer sets [19]. The standard way to define the semantics of a positive 
disjunctive logic program P is to distinguish models of P that are minimal as 
follows. An interpretation of P is simply a subset of Hb(5) and a rule A ^ C of 5 
is satisfied in an interpretation I C Hb(5) of 5 if C C / implies An/ yf 0. A set of 
atoms M C Hb(5) is a model of P if all rules of P are satisfied in M. A model M 
of 5 is a (subset) minimal model of P if there is no model M' of P such that 
M' C M. By this definition, it is possible that a positive disjunctive program has 
no minimal models (5i = {a^ , ^a}),a unique minimal model {P 2 = {a ^}) 
or even several minimal models {P 3 = {a V b ^}). By a slight abuse of notation, 
we write M = Mm (5) to declare that M is one of the minimal models of a 
positive disjunctive logic program P. Thus we may write M\ = {a} = Mm(53) 
as well as M 2 = {b} = Mm(53) although these models are not unique. 

The stable model semantics of disjunctive logic programs is obtained via 
the Gelfond-Lifschitz reduction of a disjunctive logic program P [7,18,19] which 
presumes a model candidate M . The reduced program 

5^ = {A ^ C I A V ~5 ^ C A ~5 e 5, 5 C M, and 5 n M = 0} (2) 

is a positive one. A model M of a disjunctive logic program P is stable if M is 
a minimal model of P^ (not necessarily a unique one), i.e., M = Mm(5^). 

^ Rather than using a set-based notation (1), heads and bodies of rules are often 
written as disjunctions and conjunctions, respectively. For instance, when A = {a}, 
B = {b}, C = {c}, and D = {d}, we write a V ~b ^ c A ~d for the rule (1). 
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Example 1 . Consider logic programs Pi = {a V ~a [19] and P2 = {a ^ a}. 
The former has two stable models Mi = {a} and M2 = 0 while M2 is the unique 
stable model of P2. Note that Mi is also a model of P2, but not a minimal one. 

The program Pi illustrates how the negative literal ~a in the head lets us 
express succinctly a choice regarding a: either a is in the model (a G Mi) or a 
is not in the model (a ^ M2). Simons [29] achieves the same effect by enriching 
normal logic programs with choice rules. Note that due to the negative literal 
~a in the head of the only rule of Pi, the stable models Mi and M2 of Pi break 
the well-known anti-chain property: Mi C M2 does not imply Mi = M2. 

3 Polynomial, Faithful and Modular Translations 

In this paper, we employ a framework of polynomial, faithful and modular trans- 
lation functions for comparing the expressive powers of classes C of logic pro- 
grams [15]. Some basic assumptions are imposed on any class C of logic programs. 
First of all, the class C is supposed to be closed under unions, i.e., given any two 
programs P and P' from C, then also PUP' belongs to C. On the other hand, 
it is assumed that C has a semantic operator Seme associated with it. The op- 
erator Seme assigns a set of interpretations I C Hb(P) to each program P of C. 
Typically, these interpretations are distinguished models of P. It is clear that the 
classes T>, P+, P^+, and P*^+ satisfy these criteria. The semantic operator Seme 
is the same for each class C of these: Seme assigns {M C Hb(P) | M = Mm(P*^)} 
to a program P whenever P is a member of the respective class C. 

In the following definition, we list the general requirements for a translation 
function Tr that transforms logic programs P of one class C into logic programs 
Tr(P) of another class C . The latter class is assumed to be a subclass or a 
superclass of C. We let ||P|| stand for the length of P in symbols. 

Definition 1. Given two classes of logic programs C and C that are closed under 
unions and the respective semantic operators Seme and Semcq a translation 
function Tr : C — *■ C' is 

— polynomial if for all logic programs P € C, the time required to compute 
the translation Tr(P) € C' is polynomial in ||P||, 

~ faithful if (i) for all logic programs P € C, the base Hb(P) C Hb(Tr(P)) 
and (ii) the models /interpretations in Semc(P) and Semc'(Tr(P)) are in a 
one-to-one correspondence and coincide up to Hb(P), and 

— modular if (i) for all logic programs Pi £ C and P2 G C, the translation 
Tr(Pi U P2) = Tr(Pi) UTr(P2) and (ii) C G C implies that the translation 
Tr(P') = P' for all logic programs P' G C'. 

The faithfulness requirement implies that a translation function Tr may 
introduce new atoms, but the number of such atoms is clearly bounded by 
the polynomiality requirement. Let us also note that if Tr is faithful, then 
Semc(P) = {M n Hb(P) | M G Semc'(Tr(P))} holds. The first part of the mod- 
ularity condition enforces locality of Tr, since the translation of a program P1UP2 
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is obtained as the union of the translations of the subprograms P\ and P 2 ■ This 
implies that programs can be translated rule by rule. The second part handles 
cases where programs of a class C are translated into programs in a proper sub- 
class C of C. Such a class C is typically obtained by restricting the syntax of 
the rules of the programs in C. In this setting, we require that syntactically 
restricted rules remain intact by a translation function. Note that whenever 
C C C holds, the joint effect of the modularity conditions (i) and (ii) is that 
Tr(P' U P) = P' U Tr(P) holds for all logic programs P' G C and P G C. 

We say that a translation function Tr : C ^ C' is PFM if it satisfies all 
the three criteria. If such a translation function exists, we write C ppm C and 
consider C as expressive as C. In certain cases, we can find a counter-example 
which proves that a translation function satisfying our criteria does not exist. 
We use the notation C if™ C' in such cases. Any of the letters P, F, and M may 
be omitted from the notation if the corresponding criterion is not needed in the 
counter-example (note that C C implies C C', for instance). 

More complex relations among classes of logic programs can be deduced from 
the base relations ppm and pi™. A class C is less expressive than C (denoted by 
C PPM C') if C F™ C' and C' C. Classes C and C' are equally expressive 
(denoted by C ppm C) if C p™ C and C r™ C. Classes C and C are mutually 
ineomparable (denoted by C ppm C) if C li™ C and C C. By these relations, 
we have accommodated the method proposed for non- monotonic logics [13] to 
the case of logic programs (c.f. [15] for a discussion on the main differences). 

4 Expressive Power Analysis 

Recall the inclusions C C T> stated in Section 2. Since the semantic 
operators of these classes coincide, it follows by the existence of an identity 
translation function Trjd (i.e., Trid(P) = P holds for any P from or 
that PPM V^+ and PPM but the strictuGSS of thoso rolcitioiiships 

rGiiiciiiis open. So let us bc^iu our ciucilysis by Gstublishiug ppm 

Theorem 1. ^ P+. 

Proof. Consider P = {a <— ~a} that clearly belongs to T>^'^ . Then suppose there 
is a faithful and modular translation function that maps P to a positive logic pro- 
gram Tr(P) in It follows by the faithfulness of Tr that Hb(P) C Hb(Tr(P)). 
In addition, the translation Tr(P) does not have minimal models, since P does 
not have stable models. This implies that Tr(P) has no models, i.e., Tr(P) is 
an inconsistent positive logic program. Then consider P' = PU{a <— } for which 
Tr(P') = Tr(P) U {a <— } holds, as Tr is modular. But then Tr(P') does not have 
models nor minimal models so that P' does not have stable models, as Tr is 
faithful. A contradiction, since M = {a} is a stable model of P'. □ 

Let us then concentrate on establishing that T> i^fm which implies that 
2? PPM For this result, we have to find a way to translate disjunctive logic 
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programs having occurrences of default negation in the heads of rules into head- 
positive disjunctive logic programs. For each atom a € Hb(P), we introduce a 
new atom a° which is to mean that a cannot be inferred by the rules. In anal- 
ogy to [6], the atom a° can be understood as a “positive occurrence” of the 
negative literal ~a. The difference is that we apply the idea to remove default 
negation while Gelfond and Lifschitz aim to remove negative literals formed 
with classical negation. For a set of atoms A C Hb(P), we let A° denote the 
set {a° I a S A}. For any P G T>, we distinguish a particular subset of Hb(P): 
Hd"^(P) = U{^ I ^ V ^ C A € P} is the set of atoms that appear neg- 
atively in the heads of the rules of P. Default negation can be removed from the 
heads of rules using a translation function Trh+ to be defined as follows. 

Definition 2. For a disjunctive logic program P, let Trh+(P) denote the trans- 
lation of P into a head-positive disjunctive logic program 

a A a° , a° ^ ~a I a G Hd'"(P)} U , , 

{AUB° ^CA^D\AW^B^CA^DgP} 

Thus Hb(Trh+(F’)) = Hb(P) U Hd"^(P)°. Let us establish that Trh+ is PFM. 

Theorem 2. Let P be a disjunctive logic program. If M C Hb(P) is a stable 
model of P, then M U (Hd'^(P) — M)° is a stable model o/Ti'h+(P). 

Proof. Let M be a stable model of P and M' = M U (Hd'^(P) — M)° . By the 
definitions of Trh+(P) and M', the reduct of Trh+(P) with respect to M' is 

a A a° I a G Hd~(P)} U {a° ^ | a G Hd'^(P) - M) U , , 

U P° ^ C I ^ U P° ^ C A G Trh+(P) and P n M' = 0}. 

The rules of the forms ^ a A a° and a° <— in (4) are satisfied in M' directly 
by the definition of M' . Let us then assume that some of the rules A U P° <— C 
in (4) is not satisfied in M', i.e., C C M' and {A U B°) n M' = 0. It follows by 
the definition of M' that C C M , Af\M = % and PCM. Also DtAM = % holds 
by (4) and the definition of M' . Thus the rule A <— C belongs to P^ and it is 
not satisfied by M. Thus M is not a model of P^, a contradiction. Hence the 
rule A U P° ^ C is satisfied by M' . To conclude, we have established that M' 
is a model of the reduct (4). It remains to establish the minimality of M'. 

So let us assume that M' is not a minimal model of (4), i.e., there is a 
model N' of (4) such that N' C M' . Now N' and M' must coincide on the atoms 
of Hd'"(P)°, because N' C M', N' is a model of (4), and the rule a° <— is included 
in (4) for each a G Hd'^(P) — M . Thus TV C M holds for TV = TV' n Hb(P). Then 
assume that TV is not a model of P-^, i.e., there is a rule A ^ C G P^ such 
that C C N and AC\ N = %. So there is a rule A V ~P ^ C A ~P in P such 
that PCM and P n M = 0. Consequently, A U P° ^ C A ~P belongs to 
Trh+(P) and P n M' = 0 implying that A U P° ^ C belongs to (4). Moreover, 
it follows by the definitions of TV and M' and the relationship TV' C M' that 
AnTV' = 0, P°nA^' = 0 and C C TV'. Thus AUP° ^ C is not satisfied in TV', a 
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contradiction. Hence is a model of . Then N C M implies that M is not 
a minimal model of contradicting the stability of M. Thus M' is a minimal 
model of (4), i.e., a stable model of Trh+(P). □ 

Theorem 3. Let P be a disjunctive logic program. If M' C Hb(P) U Hd'^(P)° 
is a stable model o/Ti'h+(P), then M = M' H Hb(P) is a stable model of P. 

Proof. Let M' C Hb(P) U Hd'"(P)° be a stable model of Trh+(P) and define 
M = M' n Hb(P). Consider any a G Hd'"(P). (i) Suppose that a G M and 
a° G M' . Then a G M' and <— a A a° G Trh+(P)’^ is not satisfied in M', a 
contradiction, (ii) Then assume that a ^ M and a° ^ M' . Since a G Hd'"(P) C 
Hb(P) and M = M' n Hb(P), it follows that a ^ M' . This implies that a° <— 
belongs to Trh+(P)^ . Since M' is a model of Trh+(P)^ , it holds necessarily 
that a° G M', a contradiction. Now (i) and (ii) imply for any a G Hd'"(P) that 
a ^ M AA a° G M'. Thus M' = M U (Hd'^(P) - M)° . 

Then consider any rule A V ~P <— C A ~P of the original program P. Now 
(iii) H ^ C G P“ AA P C M and P n M = 0 <fA P° n M' = 0 and P n M' = 0 
AA P° n M' = 0 and H U P° ^ C G Trh+(P)“' . 

Let us then assume that M is not a model of P^ . So there is a rule H <— C 
in P*^ such that C C M and An M = 0. This implies by (iii) that B° n M' = 0 
and the rule A U P° ^ C belongs to Trh+(P)^ . It follows that C C M' and 
(A U B°) n M' = 0. Thus A U P° ^ C is not satisfied in M', i.e., M' is not a 
model of Trh+(P)'^ , a contradiction. Hence M is a model of P^ . 

Finally, let us assume that M is not a minimal model of P^ . Then there is a 
model N of P^ such that N C M. Define a model N' = N U (Hd"^(P) — M)° so 
that N' C M' is the case. Let us assume that N' is not a model of Trh+(P)*^ , 
i.e., the reduct contains a rule which is not satisfied in N'. Three cases arise, 
(a) A rule ^ a A a° of Trh+(P)^ is false in N'. This implies that a G Hd"^(P), 
a G iV' and a° G N'. By the relationship N' C M', we obtain that a G M' and 
a° G M', a contradiction, (b) A rule a° ^ of Trh+(P)^ is false in N' . It follows 
that a G Hd"^(P) and a° ^ N' so that a° ^ M' holds, as N' and M' coincide on 
the atoms of Hd'^(P)°. Then M' is not a model of Ti'h+(P)^ , a contradiction, 
(c) A rule A U P° <— C of Trh+(P)'^ is false in N' . It follows that C C TV' and 
(A U P°) n TV' = 0 so that C C TV and A n TV = 0. Moreover, P° n TH' = 0 holds, 
as TV' and M' coincide on the atoms of Hd'^(P)°. Thus A ^ C belongs to P^ 
by (iii). In addition, this particular rule is not satisfied in TV which contradicts 
the fact that TV is a model of P^ . 

By the preceding case analysis, TV' is a model of Trh+(P)^ , a contradiction. 
Hence TVf is a minimal model of P^ and a stable model of P. □ 

Theorem 4, P pfm V'^+. 

Proof. It is obvious that Trh+ is polynomial and modular. To establish faith- 
fulness we note that Theorem 2 gives rise to a mapping fi that maps a stable 
model TVf of P to a stable model fi{M) = THU(Hd'"(P) — M)° of Trh+(P). Then 
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consider any two stable models M and N of P such that fi{M) = f 2 {N). It fol- 
lows that M = TV so that /i is injective. On the other hand, a mapping /2 that 
maps a stable model M' of Trh+(P) to a stable model / 2 (TW') = M' n Hb(P) 
of P is obtained from Theorem 3. If we have two stable models M' and TV' 
of Trh+(P) such that M = f 2 {M') = / 2 (TV') = TV, it follows by the proof of 
Theorem 3 that M' = M \J (Hd'^(P) - M) = N U (Hd~(P) - N) = N' . This 
indicates that /2 is injective. Thus it is clear that fi and /2 are bijective and 
inverses of each other. Consequently, the stable models of P and Trh+(T*) are in 
a one-to-one correspondence and they coincide up to Hb(P). □ 

Having established the equivalence of T> and we are ready to proceed to 
the analysis of body-positive programs. Recall that any P G V'°~^ is a set of rules 
of the form A\J^B <— C. By the denial of negative subgoals in the bodies of rules, 
the semantic definitions are simplified accordingly. Given P G and a model 
candidate M C Hb(P), the reduct P-^ contains a rule A <— C whenever B <G M 
for some rule A V ^ C G P. The definition of stable models remains intact, 
i.e., M = Mm(P^). However, the properties of P^^ let us establish interesting 
results for the programs of as follows. 

Lemma 1. If Q G , P C Q, and Mi C M 2 C Hb(Q), then P^^ C . 

Indeed, the reduct P^ grows monotonically with respect to P and M. This 
is in contrast with head-positive programs P G that satisfy P^^^ C P^^ 
for TVfi C M 2 C Hb(P). The monotonicity properties of P^ let us to extend 
well-known properties of minimal models of positive disjunctive programs to 
cover stable models of body-positive disjunctive programs. 

Lemma 2. If P G and M C Hb(P) is a model of P^ , then P has a stable 
model TV C Hb(P) such that TV C M . 

Proof sketch. Let M C Hb(P) be a model of P^ for P G Then we may 
use transfinite induction to construct a descending sequence of interpretations 
Mq a Ml A M 2 3 . . . such that (i) Mq = M, (ii) TVf„ C Ma-i can be chosen 
as for a successor ordinal a, and (iii) Ma. is defined as the limit 

H/ 3 <a ^0 fo'' ^ limit ordinal a. The construction can be done so that M^ remains 
a model of for any ordinal a. Moreover, it follows for a sufficiently large 
successor ordinal a (|a| > |Hb(P)|) that Ma = Ma-i- This implies by (ii) 
that Ma = Mm(P^“) so that TV = Ala is a stable model of P. In an extreme 
case, TV may become empty. This is demonstrated in Example 2. □ 

Example 2. Consider an infinite body-positive disjunctive logic program P = 
{bi V ~bi_i <— I T > 0}. It is clear that Mq = Hb(P) = {b^ | i > 0} is a model of 
pMo _ <_ I j > not a minimal one, as Mi = {b^ | z > 1} is the unique 

minimal model of P^° . Similarly, for any j > 0, T\T,_i = {b^ | z > j — 1} is not 
a minimal model of <_ | j > j}^ fojt Mj = {h.i\i > j} is. It follows 

that rii>o Mi = ^ and TV = 0 is a stable model of P. This is obvious, since the 
reduct P^ = 0. Note that TV is in fact the unique stable model of P. □ 
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Proposition 1. Consider P € and Q € such that P Q. (i) If Q 
has a stable model M C Hb(Q) then P has a stable model N C Hb(P) such that 
N C M . (ii) If P has no stable models, neither has Q. 

Proof. Suppose that M is a stable model of Q, i.e., M = Mm(Q^)- Since P^ C 
holds for M' = MnHb(P) by Lemma 1, we know that M and M' are models 
of P^ . Thus P has a stable model N C M' C M by Lemma 2. The claim (ii) 
of this proposition follows easily from (i) by contrapositive argumentation. □ 

To characterize the expressive power of the class , we note that fIfm 
Tf°^ and rpm T> hold directly by the relationships C C V and 

the identity translation function Trid- The latter relationship is shown to be a 
strict one in the following theorem. Thus body-positive disjunctive programs are 
strictly less expressive than general as well as head-positive disjunctive programs, 
as implied by the fact that pfm T> and Theorem 4. 

Theorem 5. V ^ 

Proof. Consider P={a<— ~a} from T>. Suppose there is a faithful and modular 
translation function Tr that maps P to a program Tr(P) of Since P has 
no stable models, neither has Tr(P) by the faithfulness of Tr. As Tr is modular, 
we know that Tr(P') = Tr(P) U {a <— } holds for P' = P U {a <— }. Thus Tr(P') 
has no stable models by Proposition 1. This contradicts the faithfulness of Tr, 
since P' has a unique stable model M = {a}. □ 

Let us then address the relationship ppm . Our last theorem provides 
a concrete counter-example to establish that body-positive disjunctive programs 
are strictly more expressive than positive ones, i.e., pfm V^+ holds. 

Theorem 6. fi™ P+. 

Proof. Consider a body-positive logic program P = {a V ~b <— }. Suppose there 
is a PFM translation function Tr from to P+ that maps P to a positive 
program Tr(P) such that Hb(P) C Hb(Tr(P)). It follows by the modularity of 
Tr that PU{b ^ a} is translated into Tr(PU{b ^ a}) = Tr(P)U{b ^ a}. Note 
that Hb(P U {b ^ a}) = Hb(P) and Hb(Tr(P U {b ^ a})) = Hb(Tr(P)). 

Now P U {b ^ a} has two stable models Mi = 0 and M2 = {a, b}. This 
implies by the faithfulness of Tr that Tr(P) U {b <— a} has exactly two minimal 
models Ni C Hb(Tr(P)) and N2 C Hb(Tr(P)) such that M\ = N\ C\ Hb(P) 
and M2 = iV2 n Hb(P). It follows that both 7Vi and N2 are models of Tr(P), 
but not necessarily minimal ones. Consequently, there exist minimal models N[ 
and N2 of Tr(P) such that N'l C N\ and iV^ C N2. Since P has a unique 
stable model M = 0, it follows by the faithfulness of Tr that N[ and N2 must 
be the same minimal model of Tr(P), say N' . Moreover, N' C Nif] N2 and 
N' n Hb(P) = M = 0. But then the rule b ^ a is satisfied by N' which is 
therefore a model of Tr(P) U {b ^ a} such that N' f- NiiP N2. Recall that N\ 
and N 2 form an antichain as minimal models of Tr(P) U {b ^ a}. It follows that 
N' C Ni and N' C N2 - contradicting minimality of A^i and N2. □ 
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5 Related Work 

Antoniou et al. [1] apply a modularity condition when developing normal forms 
for Nute’s defeasible logic [25]. Since the syntax of defeasible logic is based on 
rules, too, it is worth comparing their notion of modularity with the one applied 
in this paper. According to Antoniou et ah, a translation function Tr is modular, 
if Di U £>2 =L(Di)yjL{D 2 ) U Tr{D 2 ) for any defeasible theories Di and Z? 2 - 
Here = denotes semantical equivalence, i.e., the theories yield exactly the same 
conclusions in the union of the respective languages L{Di) and £(I? 2 ) of Di 
and D 2 - Similarly, Tr is correct, if D =l(d) Tr(Z?) for every D, and incremental, 
if H 1 UD 2 =l(£>i)ul(i> 2 ) Tr(Hi)UTr(£) 2 ) for every Di and D 2 - Thus any modular 
transformation is also incremental and correct [1]. Note that the part (i) of our 
definition of modularity in Definition 1 corresponds to incrementality. The main 
difference is that our definition of modularity is purely syntactical: a modular 
translation need not be faithful (i.e., correct in the terminology of Antoniou 
et ah). The notions of faithfulness differ, too, since the skeptical semantics of 
defeasible theories is based on proofs rather than models. 

Inoue and Sakama [11] present an alternative way for removing default nega- 
tion from the heads of rules. Their idea is to translate (1) into A*UB* ^ CU^D 
where A* = {a* j a S A} and B* = {a* j a S B} are sets of new atoms. In ad- 
dition, the rules a ^ a*, a* ^ {a} U B, ^ b A b*, and ^ a* A ~b have to 
be introduced for each a G A and h G B. The resulting translation function 
Tris is clearly modular, but quadratic in the worst case. In contrast to this, the 
translation function Ti'h+ in Definition 2 is linear. The translational idea behind 
Ti'h+ is also simpler than that of Tiqs. Inoue and Sakama [11, Remark 6.3] note 
anyway that the stable models of a disjunctive program P and Tris(P) are in a 
one-to-one correspondence. Thus Tiqs is also PFM and Theorem 4 is also implied 
by the results in [11]. Inoue and Sakama [11, Section 6.3] propose yet another 
polynomial and modular translation function for removing default negation from 
head-positive programs. However, Theorem 1 implies that Tris cannot be faith- 
ful. This explains why Inoue and Sakama need an additional stability condition 
on minimal models to establish faithfulness. The resulting semantics of positive 
programs is expressive enough to capture head-positive programs. 

Eiter and Gottlob [4] study the computational complexity of disjunctive logic 
programs by ranking the main decision problems (brave and cautious reasoning 
with stable/minimal models) in polynomial time hierarchy (PH). To summarize 
their results, these decision problems of positive and head-positive programs are 
complete problems on the second level of PH. By the tight semantical corre- 
spondences embodied in the relationships 'P pfm XA+, j)+ 

PFM PPM 2?, these 

results extend for the classes allowing default negation in the heads of rules, too. 

Corollary 1. For disjunctive programs in P and P^~^ , (i) brave reasoning with 
stable models forms a Yf^-complete decision problem, and (ii) cautious reasoning 
with stable models forms a Ti^-complete decision problem. 

The results concerning the class P appeared first in [11, Theorem 6.4]. Fur- 
ther differences in expressive power can be detected if the computational com- 
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plexity of checking the existence of stable models is taken into consideration. For 
disjunctive programs in this forms a S^-complete decision problem. The 

same holds for V by the relationship V ppm as well as [11, Theorem 6.4]. 
However, for any disjunctive program P from T>'^ it is sufficient to find one (even 
non-minimal) model to solve this decision problem. This is an indication of the 
fact that the problem is S^’-complete (i.e., NP-complete) [4]. By Lemma 2 and 
the relationship 'D'^ ppm V'°+, we may conclude that the corresponding decision 
problem is also S^’-complete for the class of body-positive programs 

6 Conclusions and Further Research 

In this paper, we apply a framework based on polynomial, faithful and modular 
(PFM) translation functions to study the effect of default negation on the ex- 
pressiveness of disjunctive rules. Three subclasses of the class of disjunctive logic 
programs T> are identified: the classes of positive programs , head-positive 
programs and body-positive programs To summarize the relation- 

ships established by Theorems 1, 4, 5, and 6, we have obtained an expressive 
power hierarchy (EPH) for disjunctive programs with the following structure: 

V+ PFM X)h+ 

PFM T) PFM . Therefore, we conclude that permitting default 
negation in the heads of rules does not increase the expressive power of rules 
given that default negation is allowed in the bodies of rules. The translation func- 
tion Ti'h+ given in Definition 2 removes such occurrences of default negation in 
a straightforward way. However, this is no longer possible when default negation 
is banned in the bodies of rules so that the expressive power of body-positive 
disjunctive programs exceeds that of positive disjunctive programs. Moreover, it 
is clear by the structure of the hierarchy that the expressive power of rules is 
properly increased by introducing default negation in the bodies of rules. 

However, the expressive powers of the four classes are the same if measured by 
the computational complexity of brave and cautious reasoning with stable mod- 
els. This follows by the results of Eiter and Gottlob [4], Inoue and Sakama [11], 
and this paper (Corollary 1). On the other hand, the classes and can 
be differentiated from the classes and V if the complexity of deciding the 
existence of a stable model is taken into account, but and remain equiv- 
alent even under this additional measure. Since pfm holds, we conclude 
that the measure based on PFM translations provides a refined view on the 
expressiveness of disjunctive rules involving default negation. This is because 
polynomial transformations involved in PH preserve just the plain yes/no an- 
swers of decisions problems. Compared to this, the notions of faithfulness and 
modularity (c.f. Definition 1) constitute a much stronger constraint. Let us also 
point out that the hierarchy EPH deduced in this paper remains valid even in the 
unlikely event that the complexity classes P and NP coincide and PH collapses. 

It is to be expected that the results of this paper can be extended and gen- 
eralized in several ways, (i) Currently, our results do not cover the classes of 
extended disjunctive programs where classical literals, i.e., atoms a and their 
classical negations ^a, may appear wherever atoms appear in ordinary disjunc- 
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live rules (1). In order to generalize our framework for the case of extended 
disjunctive programs, we have to extend languages associated with disjunctive 
programs and revise the notion of faithfulness accordingly. The basic technique 
for obtaining translations will be the one by Gelfond and Lifschitz [6]: classical 
negative literals are simply rewritten as new atoms, (ii) Furthermore, it will be 
interesting to study the effect of integrity constraints, i.e., rules (1) with A = 
using the framework proposed in this paper, (iii) The current notion of modular- 
ity could be split in two, i.e., notions of weak and strong modularity. The latter 
would correspond to the current notion while the former could be introduced to 
strengthen intranslatability results. For instance, the proof of Theorem 1 remains 
valid even if we introduce a notion of modularity requiring that head-positive 
rules can be translated in separation of rules that are not head-positive, (iv) So 
far our analysis covers only the stable semantics, but a wide variety of alter- 
native semantics for disjunctive logic programs have been proposed (see, e.g., 
[2,3,21,27,31]). Due to our recent experiences with non-monotonic logics [14], 
we expect (in)translatability results regarding other semantics as well. Our first 
results in this respect on Przymusinski’s partial stable models [26] can be found 
in [16]. 
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Abstract. Schlipf [Scli95] proved that Stable Logic Programming (SLP) 
solves all NP decision problems. We extend Schlipf ’s result to prove that 
SLP solves all search problems in the class NP. Moreover, we do this 
in a uniform way as defined in [MT99]. Specifically, we show that there 
is a single DATALOG^ program Pivg such that given any Turing ma- 
chine M, any polynomial p with non-negative integer coefficients and any 
input a of size n over a fixed alphabet S, there is an extensional database 
edbM,p,cr such that there is a one-to-one correspondence between the sta- 
ble models of edbM.p.a U Pjvg and the accepting computations of the 
machine M that reach the final state in at most p(n) steps. Moreover, 
edbM,p,a can be computed in polynomial time from p, a and the de- 
scription of M and the decoding of such accepting computations from 
its corresponding stable model of edbm.p.a U Pxrg can be computed in 
linear time. A similar statement holds for Default Logic with respect to 
X'^-search problems. 

We also show that there is single program Meta which is a metainter- 
preter for SLP programs. That is, for any program Q, there there is an 
encoding of Q as an extensional data base edbg such that the stable 
models of Meta U edbq are in one-to-one correspondence with the stable 
models of Q. 

1 Introduction 

The main motivation for this paper comes from recent developments in Knowl- 
edge Representation, especially the appearance of a new generation of sys- 
tems [CMTOGjNSQbjELM+OT] based on the so-called Answer Set Programming 
(ASP) paradigm [Nie98,CP98,MT99,Lif98]. In particular, these systems suggest 
that we need to revisit one of the basic issues in the foundations of ASP, namely, 
how can we characterize what such ASP systems can theoretically compute. 
Throughout this paper, we shall focus mostly on one particular ASP formalism, 
namely, the Stable Semantics for Logic Programs (SLP) [GL88]. We note that 
the underlying methods of ASP are similar to those used in Logic Program- 
ming [Ap90] and Constraint Programming [.JM94,MS99]. That is, like Logic 
Programming, ASP is a declarative formalism and the semantics of all ASP 
systems are based on logic. Like Constraint Programming, certain clauses of an 
ASP program act as constraints. There is a fundamental difference between ASP 
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programs and Constraint Logic programs, however. That is, in Constraint Pro- 
gramming, the constraints act on individual elements of Herbrand base of the 
program while the constraint clauses in ASP programs act more globally in that 
they place restrictions on what subsets of the Herbrand base can be acceptable 
answers for program. For example, suppose that we have a problem U whose 
solutions are subsets of some Herbrand base H . In order to solve the problem, 
an ASP programmer essentially writes a logic program P that describes the con- 
straints on the subsets of H which can be answers to 77. The basic idea is that the 
program P should have the property that there is an easy decoding of solutions 
of 77 from stable models of P and that all solutions of 77 can be obtained from 
stable models of P through this decoding. The program P is then submitted 
to the ASP engine such as smodels [NS96], dlv [ELM+97] or DeReS [CMT96] 
which computes the stable models of the program P. Thus the ASP engine finds 
the stable models of the program (if any exists) and we read-off the solutions 
to 77 from these stable models. Notice that the idea here is that all solutions 
are equally good in the sense that any solution found in the process described 
above is acceptable. Currently, the systems based on ASP paradigm are being 
tested on the problems related to planning, product configuration, combinatorial 
optimization problems and other domains. 

It is a well known fact that the semantics of existing Logic Programming sys- 
tems such as Prolog, Mercury and LDL have serious problems. For instance, the 
unification algorithm used by most dialects of Prolog do not enforce the occur 
check and hence these systems can produce incorrect results [AP94]. Moreover, 
the processing strategies of Prolog and similar languages have the effect that 
correct logic programs can be non-terminating [AP93]. While good program- 
ming techniques can overcome these problems, it is clear that such deficiencies 
have restricted the appeal of the Logic Programming systems for ordinary pro- 
grammers and system analysts. The promise of ASP and, in particular, of SLP 
and its extensions, such as Disjunctive Logic Programming [GL91,ELM+97], is 
that a new generation of logic programming systems can be built which have a 
clear semantics and are easier to program than the previous generation of Logic 
Programming systems. In particular, both of the problems referred to above, 
namely, the occurs check problem and the termination problem, do not exist in 
SLP. Of course, there is a price to pay, namely, SLP systems only accept pro- 
grams without function symbols. Consequently, one of the basic data structures 
used in Prolog, specifically, the term, is not available in SLP. Thus SLP systems 
require the programmer to explicitly construct many data structures. In SLP 
programming, predicates are used to construct the required data structures and 
clauses that serve as constraints are used to ensure that the predicates behave 
properly with respect to semantics of the program. SLP programs are always 
terminating because the Herbrand base is finite and hence there are only a finite 
number of stable models. In addition, unlike the case of usual Logic Program- 
ming, the order of the clauses of the program does not affect the set of stable 
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models of the program^. Finally the stable semantics of logic programs is well 
understood so that SLP programs have clear semantics. 

We note that the restriction that ASP programs do not allow function sym- 
bols is crucial. First, it is well known that once one allows function symbols in 
a logic program P, the Herbrand base becomes infinite. Moreover, the stable 
models of logic programs with function symbols can be immensely complex. For 
example, for stratified logic programs [ABW88,Prz88], the perfect model is the 
unique stable model of that program [GL88] . Apt and Blair [AB90] showed that 
perfect models of stratified logic programs capture precisely the arithmetic sets. 
That is, they show that for a given arithmetic set X of natural numbers, there is 
a finite stratified logic program Px such that in the perfect model of Px, some 
predicate px is satisfied by precisely the numbers in X. This was the first result 
that showed that it is not possible to have meaningful practical programming 
with general stratified programs if we allow unlimited use of function symbols. 
The result of [AB90] was extended in [BMS95] where Blair, Marek, and Schlipf 
showed that the set of stable models of a locally stratified program can capture 
any set in the hyperarithmetic hierarchy. Marek, Nerode, and Remmel [MNR94] 
showed that the problem of finding a stable model of a finite (predicate) logic 
program P is essentially equivalent to finding a path through an infinite branch- 
ing recursive tree. That is, given an infinite branching recursive tree T C 
there is a finite program Pt such that there is a one-to-one degree preserving cor- 
respondence between the infinite paths through T and the stable models of Pt 
and, vice versa, given an finite program P, there is a recursive tree Tp such that 
there is one-to-one degree preserving correspondence between the stable models 
of P and the infinite paths through Tp. One consequence of this result is that 
the problem of determining whether a finite predicate logic program has a stable 
model is a A(^-complete. More results on the structure of the family of stable 
models of the programs can be found in [CR99] . 

All the results mentioned in the previous paragraph show that stable seman- 
tics for logic programs admitting function symbols can be used only in a very 
limited setting. This is precisely what the XSB system attempts to do. When 
well-founded semantics is total, the resulting model is the unique stable model 
of the program. XSB attempts to query that model. Unfortunately, the class of 
programs for which it succeeds is not intuitive [RRS''"97]. ASP systems propose 
a more radical solution to the problem of complexity of stable models of logic 
programs with function symbols, namely, abandoning function symbols entirely. 
Once this is accepted, the semantics of logic program P can be defined in two 
stages. First, we assume, as in standard Logic Programming, that we interpret P 
over the Herbrand universe of P determined by the predicates and constants 
that occur in P. Since, the set of constants occurring in the he program is finite, 
we can grounded the program in these constants to obtain a finite propositional 
logic program Pg . The stable models of P are by definition the stable models Pg . 
The process of grounding is performed by a separate grounding engine such as 

However it is the case that the order of the clauses can affects the processing time 
of the ASP engine. 
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Iparse [NS96]. The grounded program is then passed to the engine computing the 
stable models. It is then easy to check that the features of SLP mentioned above, 
i.e., the absence of occurs check and termination problems and the independence 
of the semantics from the ordering of the clauses of the program, automatically 
hold. 

The language of logic programming without function symbols was studied by 
the database community with the hope that it could lead to new, more powerful, 
database language [U1188]. This language is called DATALOG and some database 
systems such as DB2 implement the positive part of DATALOG. The fact that 
admitting negation in the bodies of clauses leads to multiple stable models was 
unacceptable from the database perspective. Hence the database community 
preferred other semantics for DATALOG program with negation such as the 
well-founded semantics [VRS91] or the inflationary semantics [AHV95]. 

The main purpose of this paper is to revisit the question of what can be 
computed by logic programs without functions symbols under the stable model 
semantics. First, consider the case of finite propositional programs. Here the 
situation is simple. Given a set At of propositional atoms, let IF be a finite 
antichain of subsets of At (i.e. whenever X,Y £ tF, X C Y, then X = Y). 
Then one can show that there is a logic program such that T is precisely the 
class of all stable models of Pjr [MT93] . Moreover, the family of stable models of 
any program P forms such an antichain. Thus in the case of finite propositional 
logic programs, we have a complete characterization of the possible sets of stable 
models. Note, however, this result does not tell us anything about the uniformity 
and the effectiveness of the construction. The basic complexity result for SLP 
propositional programs is due to Marek and Truszczyhski [MT91] who showed 
that the problem of deciding whether a finite propositional logic program has a 
stable model is AP-complete. 

To formulate our question about what can be computed by logic programs 
without functions symbols under the stable model semantics, we first recall the 
notion of search problem [G.J79] and of a uniform logic program [MT99]. A search 
problem is a set S of finite instances [GJ79] such that, given any instance I £ S, 
we have a set Sj of solutions to S for instance I. For example, the search problem 
may be to And Hamiltonian paths in a graph. Thus, the set of instances of the 
problem is the set of all finite graphs and, for any given instance I, Sj is the 
set of all Hamiltonian paths of I. An algorithm solves the search problem S 
if it returns a solution s £ Si whenever Si is non-empty and it returns the 
string “empty” otherwise. We say that a search problem S is in NP if there is 
such an algorithm which can be computed by a non-deterministic polynomial 
time Turing machine. We say that search problem S is solved by a uniform logic 
program if there exists: 

1. a polynomial time encoding edbg under which every instance / of 5 is mapped 
to a finite set of facts, i.e. clauses with empty bodies and no variables, and 

2. a single logic program Pg such that there is a polynomial time computable 
function sols{-, •) such that for every instance I of S, sols{I, •) maps the set of 
stable models of the edbs{I) U P onto the set of solutions Si of I. 
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We note that decision problems can be viewed as special cases of search 
problems. Schlipf [Sch95] has shown that the class of decision problems in NP 
is captured precisely by uniform logic programs. Specifically he proved that a 
decision problem is solved by a uniform logic program if and only if it is in 
NP. An excellent review of the complexity and expressivity results for Logic 
programming can be found in [DEGV99]. 

The goal of this paper is to prove a strengthening of Schlipf ’s result as well 
as prove a number of related facts. First, we will prove that the Schlipf’s result 
can be extended to all NP search problems. That is, we shall show that there 
is a single logic program Pxrg that is capable of simulating polynomial time 
nondeterministic Turing machines in the sense that given any polynomial time 
nondeterministic Turing machine M, any input a, and any run-time polynomial 
p{x), there is a set of facts edbM,p,a (depending on M, p{x) and a) such that a 
stable model of Prrg U edhM,p,a- codes an accepting computation of M started 
with input a that terminates in p(|(r|) steps and any such accepting computation 
of M is coded by some stable model of Prrg U edbM,p,a- This result will show 
that logic programs without function symbols under the stable logic semantics 
capture all AP-search problems^. The converse implication, that is, a search 
problem computed by a uniform logic program P is an AP-search problem is 
obvious since one can compute a stable model SM of a program by first guessing 
SM and then doing a polynomial time check that SM is a stable model of the 
program. 

2 Technical Preliminaries 

In this section we formally introduce several notions that will be needed for the 
proof of our main result. Our proof of this result uses essentially the same idea 
used by Cook [Co71] in his proof of the AP-completeness of the satisfiability 
problem. 

First, we introduce the set of logic programs that we will study. We will 
consider here only so called DATALOG^ programs. Specifically, a clause is an 
expression of the form 

p(X) ^ <71 (A), ... , q^(X), - n(A), . . . , - r„(A) (1) 

where p, < 71 , . . . , rij ■ ■ • j atoms, possibly with variables and/or con- 

stants. A program is a finite set P of clauses of the form (1). Fach program de- 
termines its language (based on the predicates occurring in the program) . Since 
there are no function symbols in our programs, both the Herbrand universe and 
the Herbrand base of the program are finite. 

^ As pointed by M. Truszczynski, for our goal of describing the complexity of the 
Stable Logic Programming, a weaker result is sufficient. That is, we need only show 
that for each instance I of an NP search problem 77, there is a program Pj and a 
polynomial time projection from the collection of stable models of Pi to the set of 
solutions of 1. Our result shows that this property holds in a stronger form, namely, 
there is a single program with a varying extensional database. 
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A ground instance of the clause C of the form (1) is the result of a simultane- 
ous substitution of constants for variables occurring in C. Given a program P, Pg 
is the propositional program consisting of all ground substitutions of clauses of P. 

Given a propositional program P and a set M included in its Herbrand 
base, Hp, the Gelfond-Lifschitz transformation of P by means of M , GL{P, M) 
is the program GL{P, M) arising from P as follows. First, eliminate all clauses G 
in P such that for some j, 1 < j < n, rj G M. Finally, in any remaining clauses, 
we eliminate all negated atoms. The resulting set of clauses forms a program, 
GL{P, M), which is a Horn program and hence it possesses a least model Nm- 
We say that M is a stable model of the propositional program P ii M = Nm- 
Finally, we say that M is a stable model of a program P (now possibly with 
variables), if M is a stable model of the propositional program Pg. 

A (nondeterministic) Turing Machine is a structure of the form 

M=iQ,S,P,D,5,sJ), 

where Q is a finite set of states and A is a finite alphabet of input symbols. 
We assume Q always contains two special states, sq, the start state, and /, 
the final state. We assume that there is special symbol B for “blank” such 
that B ^ S. The set P = A U {B} is the set of tape symbols. The set D of 
move directions will consist of elements l,r, and A where I is the “move left” 
symbol, r is the “move right” symbol and A is the “stay put” symbol. The 
function 6 : Q x P ^ V{Q x P x D) is the transition function of the machine M. 
We can think of (5 as a 5-ary relation. We assume M operates on a one-way infinite 
tape where the cells of the tape are labeled from left to right by 0, 1,2,.. .. To 
visualize the behavior of the machine M, we shall talk about the read-write head 
of the machine. At any given time, in a computation, the read-write head of M 
is always in some state s G Q and is reading some symbol p G P. It then picks 
an instruction (si, pi, d) G S{s,p) and then replaces the symbol p by pi, changes 
its state to state si, and moves according to d. 

Suppose we are given a Turing machine M whose runtimes are bounded by 
a polynomial p{x) = oq -I- aix -I- • • • -I- where each ai G N = {0, 1,2,.. .} 
and Ok yf 0. That is, on any input of size n, an accepting computation terminates 
in at most p(n) steps. Then any accepting computation on input a can affect at 
most the first p(n) cells of the tape. Thus in such a situation, there is no loss in 
only considering tapes of length p{n). Hence in what follows, one shall implicitly 
assume that that the tape is finite. Moreover, it will be convenient to modify the 
standard operation of M in the following ways. 

1. We shall assume 5{f, a) = {(/, a, A)} for all a G P. 

2. Given an input x of length n, instead of immediately halting when we first get 
to the final state / reading a symbol a, we just keep executing the instruction 
(/, a, A) until we have completed p(n) steps. That is, we remain in state /, we 
never move, and we never change any symbols on the tape after we get to state /. 

The main effect of these modifications is that all accepting computations will 
run for exactly p(n) steps on an input of size n. 
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3 Uniform Coding of Turing Machines 
by a Logic Program 

In this section, we shall describe the logic program Prrg and our extensional 
data base coding edbM,p,a described above. The key to our construction is the 
fact that at any given moment of time, the behavior of a Turing machine M 
depends only on the current state of tape, the position of the read-write head 
and the set of available instructions. Our coding of Turing machine computation 
will reflect this observation. 

First, we define the language (i.e. a signature) of the program Prrg - The set of 
predicates that will occur in our extensional database are the following: time{X) 
for “X is a time step”, cell{X) for “X is a cell number”, symb{X) for “X is a 
symbol”, state{S) for “S is a state”, i_position{P) for “P is the initial position 
of the read-write head”, data{P,Q) for “Initially, the tape stores the symbol Q 
at the cell P”, delta{X, Y,X1,Y1, Z) for “the triple (XI, Tl, Z) is an executable 
instruction when the read-write head is in state X and reads symbols F” (thus 
delta represents the five-place relation (5), neq{X, Y) for “X ^Y" succ{X, Y) 
for Y = X + 1. 

Next we describe the constants that will be used in our description of time, 
cell numbers, cell contents and specific machines. The last two families of con- 
stants will be “machine-dependent” , since we did not specify any restrictions on 
the finite sets Q and X. Thus we have the following set of constant symbols: (1) 
0, 1, . . . ,p(n) where n is the length of the input a and p is the runtime polyno- 
mial, (2) s, for each s G S. Note two constants sq (for initial state), and / (for 
final state) will be present in every extensional database. (3) x for each x G E, 
and B (blank symbol), and Anally (4) r,l,X. 

This given, we can easily define the extensional database extM,p,<r- That is, 
given input a = cti . . .cr„, runtime polynomial p{x), we let edbM,a-,p consist of 
the following set of facts that describe the machine M, the segment of integers 
0, . . . ,p{n) and the initial configuration of the tape. 

1. state{s) for s G Q 

2. symb{x) for x G P 

3. deZto(s, x, si, a;l, d) <— for every pair (s,a;) G Q x P and every triple 
(si, xl, d) G S{s, x) 

4. succ(i, i -b 1) <— for 0 < i < p(n). 

5. time{i) <— for 0 < i < p{n) 

6. cell{i) ^ for 0 < f < p{n) — 1. 

7. data{m, a{m)) for 0 < m < \a\ — 1 

8. data{m, B) for |cr| < m < p{n) — 1 

9. dir(l), dir{r), dir{\) 

10. ijposition{Q) 

11. neq{a, b) <— for all a, 6 G S' U P U {0, . . . ,p(n)} with b. 

® Technically, we should use a separate inequality relation for each type, but we will 
not use different symbols for these inequality relations. 
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The remaining predicates of Pxrg are the following: tape{P,Q,T) for “the 
tape stores symbol Q at cell P at time T”, position{P,T) for “the read- 

write head reads the content cell P at time T”, state{S,T) for “the read- 

write head is in state S at time T” (notice that we have both a unary 

predicate state /I with the content consisting of states, and state j2 to de- 

scribe the evolution of the machine), instr{S,Q, S1,Q1, D,T) for “Instruction 
{S1,Q1, D) belonging to S{S,Q) has been selected for execution at time T”, 
otherInstr{S, Q, 51, Ql,D, T) for “Instruction other than (51, Ql, D) belonging 
to ^(5, Q) has been selected for execution at time T” , instr_def{T) for “there is 
an instruction to be executed at time T” , completion for “computation success- 
fully completed”, and A, a propositional letter 

In the program Pxrg, there should be no constants. We will not be abso- 
lutely strict in this respect. For ease of presentation, we will use the constants 
0, /, and sq. These can easily be eliminated by introducing appropriate unary 
predicates. Also we shall write y = x + 1 for succ{x,y). Finally to simplify the 
clauses, we will follow here the notation used in the smodels syntax. That is, we 
will use p{Xx', . . . , Xk) as an abbreviation for p(Ai), . . . ,p{Xk). 

This given, we are now ready to write the program Pxng- 

Group 1. Our first four clauses are used to describe the position of the read- 
write head at any given time t. 

(1.1) position{P,T) ^ T = 0,ijposition{P) 

(1.2) position{P,Tl) Tl = T + l,position{Pl,T), state(S^T), 

tape{Pl, Q, T), instr{S, Q, 51, Ql, D,T), D = I, neq{Pl, 0), PI = 

P+l 

(1.3) position{P,Tl) <— T1 = T -|- l,position{Pl,T), state{S,T), 
tape(Pl, Q, T), instr{S, Q, 51, Ql, D,T), D = r, P = PI + 1 

(1.4) position{P,Tl) <— T1 = T -|- l,position{P,T), state{S,T), 

tape{Pl, Q, T), instr{S, Q, 51, Ql, D, T), D = X 

Group 2. Our next three clauses describe how the contents of the tape change 
as instructions get executed. 

(2.1) tape{P, Q,T) -I— T = 0, data{P, Q) 

(2.2) tape(P, Ql, Tl) ^ T1 = T -|- l,position{P^T), state(S^T), 

tape\p, Q, T), instr{S, Q, 51, Ql, D, T) 

(2.3) tape{P,Q,Tl) <— T1 = T -|- 1, tape{P,Q,T), position{Pl,T), 
neq{P, PI) 

Group 3. Our next two clauses describe how the state of the read- write head 
evolves in time. 

(3.1) state{S, T) =0,S = sq 

(3.2) state{S,Tl) ■>— T1 = T + l,position{P,T), state{Sl,T), 
tape{P, Q, T), mstr(51, Q, 5, Ql, P, T) 

The propositional letter A will be used whenever we write clauses acting as con- 
straints. That is, the symbol A will occur in the following syntactical configuration. A 
will be the head of some clause, and the negation of A will also occur in the body 
of that same clause. In such situation a stable model cannot satisfy the remaining 
atoms in the body of that clause. 
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Group 4. Our next two clauses describe how we select a unique instruction to 
be executed at time T. 

(4.1) Selecting instruction at step 0. 

instr{S, Q, 5'1, Ql, D, T) <— state{S; S'!), symb{Q] Ql), dir{D), 
time{T), T = 0, S = so,i-position{P),tape{P,Q,T), 
delta{S, Q, SI, Ql, D), mother Instr{S, Q, S'!, Ql, D, T) 

(4.2) Selecting instruction at other steps. 

instr{S, Q, S'!, Ql, D, T) ^ state{S] S'!), symb{Q; Ql), 
dir{D) , time{T) , position{P, T) , state{S, T) , tape{P, Q,T), 
delta{S, Q, S'!, Ql, D), mother In sir {S, Q, S'!, Ql, D, T) 

Group 5. Our next set of clauses define the otherinstr predicate and (5.6) and 
(5.7) ensure that exactly one instruction is selected for execution at 
any given time T. 

(5.1) otherInstr{S, Q, SI, Ql, Dl, T) <— state{S] S'; SI; S2), 
symbiQ; Q'; Ql; Q2), time{T), dir(D; D2), 
instr{S' , Q' , S2, Q2, D2, T), neq{S2, S'!) 

(5.2) otherinstr {S , Q, S'!, Ql, Dl, T) ^ state{S; S'; S'!; S2), 
symb{Q; Q'; Ql; Q2), time{T), dir{D; D2), 
instr{S', Q' , S2, Q2, D2, T), neq{Q2, Ql) 

(5.3) otherInstr{S, Q, S'!, Ql, Dl, T) ^ state{S; S'; S'!; S2), 
symb{Q; Q'; Ql; Q2), time{T), dir{D; D2), 
instr{S', Q' , S2, Q2, D2, T), neq{D2, Dl) 

(5.4) otherInstr{S, Q, 51, Ql, Dl, T) <— state{S; S'; SI; 52), 
symb{Q; Q'; Ql; Q2), time{T), dir{D; D2), 
instr{S' , Q' , 52, Q2, D2, T), neq{S' , S) 

(5.5) otherInstr{S, Q, 51, Ql, Dl, T) ^ state{S; S'; 51; 52), 
symb(Q; Q'; Ql; Q2), time{T), dir{D; D2), 
instr{S' , Q' , 52, Q2, D2, T), neq{Q', Q) 

(5.6) The definition of the instr.def predicate. 
instr_def{T) <— state{S; 51), sym6(Q; Ql), dir(D),time(T), 
instr{S, Q, 51, Ql, D, T) 

(5.7) The clause to ensure that there is an instruction to be executed 
at any given time. 

A <— time(T), -iinstr_def (T), 

Group 6. Gonstraints for the coherence of the computation process. 

(6.1) When the task is completed. 
completion <— instr{f, Q, f, Q, X,p{n)) 

(6.2) The atom completion belongs to every stable model. 

A ^ completion, ^A 

4 Main Results 

Our first proposition immediately follows from our construction. 

Proposition 1. There is a polynomial q so that for every machine M , polyno- 
mial p, and an input a, the size of the extensional database edbM,p,ir is equal to 
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In the full version of the paper, we shall prove that for any nondeterministic 
Turing Machine M, runtime polynomial p(a;), and input a of length n, the stable 
models of edbM,p,a U Prrg encode the sequences of tapes of length p{n) which 
occur in the steps of an accepting computation of M starting on cr and that any 
such sequence of steps can be used to produce a stable model of edbM,p,a U Prrg- 

Theorem 1. The mapping of Turing machines to DATALOCT programs de- 
fined by M 1 -^ edbM,p,a U Pxng has the property that there is a T1 polynomial 
time correspondence between the set of stable models of edbM,p,iy U Prng o^nd the 
set of computations of M of the length p{n) ending in the state f. 

Corollary 1. A search S problem can he solved by means of a uniform logic 
program in SLP if and only if S is an NP -search problem. 

One can also show that all supported models of edbM,p,<r U Prng are stable. 
This fact implies that the similar corollary holds for Supported Logic Program- 
ming, SuLP. 

Corollary 2. A search S problem can be solved by means of a uniform logic 
program in SuLP if and only if S is an NP -search problem. 

Finally we can prove similar results for default logic programs without func- 
tion symbols with respect to nondeterministic Turing machines with an oracle 
for 3-SAT. It thus follows that a search problem S can be solved by means of 
a uniform default logic program if and only if S is in S 2 ■ ^ decision version of 
this result has been proved in [CEG97]. 

Theorem 2. For each n G N there is a default theory (Wn,Dn) such that for 
every 3-SAT oracle Turing machine M , every polynomial p € and every 

finite input a where \a\ = n, there is a polynomial-time one-to-one correspon- 
dence between the accepting computations of length p(n) of M on input a and 
the Reiter extensions of the default theory {edbM,p,a U 

5 Metainterpreters 

The results of section 4 suggest that there should be a universal logic pro- 
gram PMeta for the stable model semantics in the sense that for any logic pro- 
gram (5, there exists an extensional database edbq describing Q such that there 
is a one-to-one correspondence between the stable models of PMeta U edbq and 
the stable models of Q. We call such a program program a metainterpreter for 
SLP programs. 

First, we will describe a metainterpreter for the class of so-called 0-2 pro- 
grams. A propositional program P is a, 0-2 program if for every clause C of P 
has either no positive literal in the body, or exactly 2 positive literals in the 
body. Blair proved that 0-2 programs semi-represent all propositional programs 
(see [MT93], Ch. 5, for the discussion of semirepresentability) . The following 
result is due to Blair. 



On the Expressibility of Stable Logic Programming 117 



Proposition 2 (Blair). There is a linear-lime eomputahle function f that as- 
signs to each propositional program P, a 0-2 program f{P) such that there is a 
one-to-one projection from the family of stable models of f{P) to the family of 
stable models of P. 

We will describe a metainterpreter (which we will call Metal ) that computes 
stable models of 0-2 propositional programs. To this end we need a data structure 
that expresses the given 0-2 program. The extensional predicates describing the 
input program are as follows: atomf), to describe atoms, clause(-), to describe 
clauses, head{-,-) to describe the head of a clause, neg{-, •) to state that an atom 
occurs negatively in the body of a clause, first{-, •) to state that an atom is the 
first of two positive atoms occurring in the body of a clause, and second •) to 
state that an atom is the second of two positive atoms occurring in the body of 
a clause. 

The description of a propositional program Q (the data for the program 
Metal consists of the following facts: atom{a) ^ for all atoms a occurring 
in Q,clause{c) ^ for all clauses c of Q, head{a,c) <— whenever a is the head 
of clause c in Q, first{a, c) <— and second{b, c) <— whenever a and b are the first 
and the second atoms in the body of clause c in Q, respectively. We call this 
collection edbq. 

The remaining predicates of Metal are the following: nempty{-), to describe 
that there are atoms occurring positively in the body of a clause, empty(-), to de- 
scribe that there are no atoms occurring positively in the body of a clause, m(-), 
to describe the stable model of the input program itself, out{-), to describe the 
complement of the stable model of the input program, unusable^-), to describe 
the clauses not involved in the computation of the stable model, usable{-), to de- 
scribe the clauses involved in the computation of the stable model, computed{-), 
to describe the computed atoms, and A, a propositional atom. 

This given. Metal consists of the following clauses. 

1. Generating the model. 

(a) in{B) ^ atom{B),^out{B) 

(b) out{B) -I— atom{B),^in{B) 

2. Computing Gelfond-Lifschitz reduct. 

(a) unusable{C) <— clause{C), atom{B), neg{B, C), in{B) 

(b) usable{C) clause{C),^unusable{C) 

3. Classifying clauses. 

(a) nempty{C) <— clause{C), atom (B), first (B,C) 

(b) empty{C) <— clause{C),^nempty{C) 

4. Computation process. 

(a) computed{B) <— clause{C), empty{C), usable{C), head{B, C) 

(b) computed{B) clause{C),first{Bl,C),second{B2,C), 
computed (Bl)^ computed{B2) , head{B, C) 

5. Constraints. 

(a) A <— atom{B),m{B),^computed{B),^A 

(b) A <— atom{B), out{B), computed{B) , ^A 
We then can prove the following. 
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Proposition 3. There is a one-to-one projection that for every propositional 
program Q maps stable models of Metal U edbq to stable models of Q. 

In the full version of this paper, we construct yet another metainterpreter Meta2 
for SLP, that accepts all propositional programs (not only 0 — 2-programs) and 
have the property that its supported models are automatically stable. The size 
of the representation of the extensional database is, however, larger. A num- 
ber of metainterpreters for various classes of programs have been constructed 
in [EFLPOl]. 
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Abstract. We investigate in this paper the relationship between an am- 
biguity propagating defeasible logic recently proposed by Antoniou et 
al. [3] and well-founded semantics with priorities [6] under a straight- 
forward translation from defeasible theories to extended logic programs. 
It turns out that a slightly restricted version of defeasible logic is cor- 
rect wrt well-founded semantics yet incomplete. We also investigate the 
sources of the incompleteness and argue that the additional conclusions 
obtained by prioritized well-founded semantics are indeed desired. 



1 Introduction 

Defeasible Logic was originally proposed by Donald Nute in 1987 [13] (for an 
overview see the handbook article [14]). The logic was never as prominent as, 
say, default logic [17] or circumscription [11]. Yet it has received considerable 
attention in recent years. This is at least partly due to a very active group of 
researchers at Griffith University which has worked on theoretical foundations, 
further development and implementations of defeasible logic(s) [1,2,3,12]. 

The main advantage of defeasible logic is certainly computational: the com- 
putation of conclusions is polynomial and highly efficient implementations ex- 
ist [12]. A second advantage are its built in preference handling facilities. 

In the meantime several variants of Defeasible Logic have been proposed. 
All of them are defined proof theoretically. Defeasible logic(s) belong to a class 
of nonmonotonic approaches which can be called directly sceptical. By this we 
mean sceptical approaches where the conclusions, rather than being defined as 
the intersection of extensions or answer sets, are constructed directly. 

In the area of logic programming well-founded semantics can be viewed as 
a directly sceptical semantics. It is interesting to see, therefore, what the ex- 
act relationship between these two approaches is. Since the preference handling 
techniques of defeasible logic have no counterpart in standard well-founded se- 
mantics the comparison will be based on an extension of well-founded semantics 
which was recently proposed by the author of this paper. ^ 

^ Although numerous prioritized version of logic programs under stable model or an- 
swer set semantics exist (see [7] for a discussion of some of these approaches) not 
much work has been done on prioritizing well-founded semantics. 
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To be precise, we will compare one of the arguably most interesting defeasible 
logics, an ambiguity propagating variant presented in [3], with the prioritized 
version of well-founded semantics for extended logic programs. This semantics 
was originally proposed in [6]. In this paper we use a considerably simplified 
version. The simplification is possible since for the purposes of this paper we do 
not need the ability to represent preferences in the logical language. 

The major result of this paper is the correctness of the considered defeasible 
logic under the condition that no defeasible rule is preferred to a strict rule. 
We also investigate reasons for the incompleteness of the logic and argue that 
the additional conclusions obtained by well-founded semantics are desirable. The 
paper may thus be read as a critique of defeasible logic. 

The rest of the paper is organized as follows. Sect. 2 describes the ambiguity 
propagating defeasible logic used for comparison in this paper. Sect. 3 presents 
a simplified version of the preferred well-founded semantics in [6]. Sect. 4 intro- 
duces the translation from defeasible logic to extended logic programs. Sect. 5 
establishes the correctness result and Sect. 6 incompleteness. Sect. 7 concludes. 

The analysis in the paper is performed in a propositional setting, that is, 
we consider propositional defeasible theories and propositional extended logic 
programs. 



2 Defeasible Logic 

Defeasible logic was first introduced by Nute [13]. It is based on strict rules of 
the form A ^ p and defeasible rules of the form A ^ p. In both cases ^ is a set 
of literals and p a literal. We omit set brackets whenever A is a singleton set. 
Facts are represented as strict rules with empty set of antecedents (in which case 
the arrow is left out). Nute also introduced a third type of rules called defeaters 
which can block the derivation of a literal without giving rise to the derivation 
of the complementary literal. In [2] it is shown that defeaters are not essential 
in the sense that they can be simulated by the other rules. We will therefore not 
discuss defeaters in this paper. 

To solve conflicts among rules Nute used a preference relation > among rules: 
r > r' intuitively stands for: r has higher priority than r' . The preference relation 
is required to be acyclic, i.e. its transitive closure must be irreflexive. 

Nute’s original defeasible logic was not ambiguity propagating. Consider the 
following example: 

Example 1: 

1) ^p 

2 ) ^ ^p 

3) ^ q 

4:) p ^ 

Since p is not defeasibly provable (because of the conflicting second rule) 
rule 4) is disregarded and q is defeasibly provable in Nute’s logic. This seems 
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highly questionable since, although p is not accepted, there is an argument sup- 
porting ~^q which should not be disregarded in a sceptical approach. 

For this reason Antoniou and colleagues [3] defined an “ambiguity propa- 
gating defeasible logic without team defeat” which behaves as desired in the 
example. We consider this logic as one of the most interesting variants of defea- 
sible logic and use it for our comparison in this paper. 

A defeasible theory is a pair T = {R, >) where i? is a finite set of strict and 
defeasible rules and > is the preference relation among R. A conclusion of T 
is a tagged literal: +Aq means q is strictly provable, +Sq means q is defeasibly 
provable, and +crq means q is supported.^ The tags preceded by minus-signs 
stand for corresponding negated expressions. A proof is a finite sequence P of 
tagged literals. P{i) denotes the i-th element in the sequence, P{l..i) its initial 
segment of length i. The complement of a literal q is denoted —q. R[q] is the set 
of rules with head q, i?s[Q] the subset of R[q] consisting of all strict rules. The 
antecedents of a rule r are denoted A{r). 

Inference rules are phrased as conditions on proofs as follows:^ 

+A : If P{i -b 1) = +Aq then 

3r € Va € A{r) : +Aa € P{l..i). 

—A : If P{i -b 1) = —Aq then 

Vr G Rs[q] 3a G A{r) : —Aa G P{l..i). 

-b(5 : If P{i -b 1) = +Sq then 
(1) +Aq G P{l..i) or 

(2.1) — Z\ — g G P{l..i) and 

(2.2) 3r G R[q] such that 

Vo G A[r] : -b<5a G P{l..i) and 
Vs G R[—q\- 

3a G A[s] : —era G P{l..i) or 
r > s. 

—S : If P{i -b 1) = —Sq then 
(1) — Z\g G P{l..i) and 

(2.1) +A — g G P{l..i) or 

(2.2) Vr G R[q]: 

3a G A[r] : —5a G P{l..i) or 
3s G R[—q] such that 

Va G A[s] : -bera G P{l..i) and 
r s. 

+a : If P{i -b 1) = -berg then 
+Aq G P{l..i) or 

^ We use a here rather than the less readable and less mnemonic symbol used in [3]. 

® The rule for -ber in [3] had a mistake in the last line (G. Antoniou, personal commu- 
nication) which is corrected here. 
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3r e i?[g] such that 

Va G A[r] : +cra G and 

Vs G i?[— q]: 

3a G A[s] : —5a G or 

s r. 

—a : If P{i + 1) = —aq then 
—Aq G P{l..i) and 
Vr G R[q]: 

3a G A[r] : —aa G P{l..i) or 
3s G R[—q] such that 

Va G A[s] : +^a G P{l..i) and 
s > r. 

Consider the following example: 

Example 2: 

1 ) p 

2) p^ q 

3) q^ r 

4) ^ ^ r 

Assume there are no priorities. Here is a proof for +ar: 

+Ap, +Aq, +crq, +ar. 

If we add the preference 4 > 3 the last step in the proof does not go through. 
Indeed, we now have the following proof for +5^r: 

—Ar, +5^r. 



3 Prioritized Well-Founded Semantics 

In this section we present a simplified version of the well-founded semantics for 
extended logic programs with priorities which was defined in [6]. The simpli- 
fication is possible because we are not interested here in expressing preference 
information in the logical language (for a discussion why this may be useful 
see [6]). 

A (propositional) extended logic program is a finite set of rules of the form 
c ^ a , a^j , Tiot Tiot b^ryi 

where the ai,bj and c are propositional literals, i.e., either propositional atoms 
or such atoms preceded by the classical negation sign. The symbol not denotes 
negation by failure (default negation), ^ denotes strong negation. An extended 
logic program is a finite set P of rules. A prioritzed logic program is a pair 
(P, >) where P is an extended logic program and > an acyclic preference relation 
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on P: as in defeasible logic r > r' stands for r is preferred over r' . In [6] > was 
required to be transitive. This restriction is not necessary and dropped here for 
the purpose of comparison. Note also that in the earlier paper the “smaller” 
rules were preferred rather than the “bigger” rules as in this paper. 

Well-founded semantics is an inherently sceptical semantics that refrains from 
drawing conclusions whenever there is a potential conflict. Its original formula- 
tion for general logic programs by Gelder, Ross and Schlipf [9] is based on a 
certain partial model. Przymusinski reconstructed this definition in 3-valued 
logic [15,16]. A reformulation based on the least fixed point of a monotone op- 
erator, namely the twofold application of the Gelfond/Lifschitz y-operator [8], 
was first given by Baral and Subrahmanian [5]. The straightforward extension 
of this formulation to extended logic programs that underlies our approach was 
used by several authors, e.g. [4,10]. 

Let us first introduce the y-operator. We say a rule r of the form above is 
defeated by a literal I \i I = bi for some i G to}. We say r is defeated by 

a set of literals X \i X contains a literal that defeats r. 

Let P be a logic program, X a set of literals. The A-reduct of P, denoted P^, 
is the program obtained from P by deleting each rule defeated by X. For a set 
of rules P, the closure Cl{R) is the smallest set of literals closed under R and 
the consequences Cn{R) the smallest set of literals that is (1) closed under R, 
and (2) logically closed, i.e., either consistent or equal to the set of all literals. 
For the computation of the closure we simply neglect default negated literals. 

The Gelfond/Lifschitz operator yp now is defined as follows: 

yp(A) = Cn{P^) 

For normal logic programs (that is programs without strong negation) the 
atoms true according to well-founded semantics are just the least fixed point 
of the twofold application of y. It was argued in [6] that for the extension of 
well-founded semantics to extended logic programs with two kinds of negation it 
is favourable to slightly modify the fixed point operator: rather than computing 
the least fixed point of y^ Brewka proposed to compute the least fixed point of 
yy* where y* rather than yielding the consequences Cn(P^) yields the closure 
Cl{P^). This leads to a larger set of well-founded conclusions without violating 
correctness wrt answer set semantics. 

The intuition behind well-founded semantics can be described as follows: 
given a set of literals S already known to be derivable, produces a set of 

potential conclusions which still might defeat rules in P. The conclusions of rules 
not defeated by any of the potential defeaters are clearly derivable. Starting with 
the empty set, we thus generate larger and larger sets S until a fixed point is 
reached. The following terminology reflects this intuition: 

Definition 1. Let P be an extended logic program. 

— A literal I is an S -potential def eater iff I is in the closure of the rules in P 

not defeated by S. 

— A rule r is S -undef eatable iff r is not defeated by any S-potential defeater. 
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— A literal I is S-derivable iff I is a consequence of S-undef eatable rules in P. 

It is obvious that I is S'-derivable iff I G 7 ( 7 * (5')). The least fixed point of 
77 * is called WFS{P), or simply WPS if P is clear from context. 

To take preferences into account we first introduce a notion of dominance. 
Intuitively, a rule r dominates a rule r' in the context of a set of literals S' if r 
has higher priority and if the application of r in context S actually defeats r' . As 
pointed out in [ 6 ] the second condition is necessary to guarantee that prioritized 
well-founded semantics is an extension of well-founded semantics. Here is the 
formal definition 

Definition 2. Let r and r' be rules, S a set of literals. We say r S-dominates r' 

€ 

1. r > r' , and 

2. Cl{{r} U {s : s is S -undef eatable}) defeats r' . 

For the case of prioritized programs Def. 1 becomes 
Definition 3. Let (P, >) be a prioritized logic program. 

— A literal I is an S -potential r -def eater iff I is in the closure of rules in P 

which are (1) not defeated by S and (2) not S-dominated by r. 

— A rule r is S-safe iff r is not defeated by any S-potential r-defeater. 

— A literal I is S-derivable iff I is a consequence of the set of S-safe rules in P. 

The definition for prioritized logic programs is different from the one for non- 
prioritized programs in two respects. Firstly, there is not a single set of potential 
defeaters for all rules but each rule r has its own set of potential defeaters. 
Secondly, the rules which are used to derive potential defeaters must satisfy 
an additional condition: to potentially defeat r a rule must not be dominated 
by r in context S. Since fewer rules can be used to derive potential defeaters 
for a rule the safe rules are a superset of the undefeatable rules. We thus obtain 
more derivable literals. For the special case where > is empty the two definitions 
of S'-derivable clearly coincide. 

The set of S-derivable literals grows monotonically with S. We thus can start 
as usual with the empty set of literals and iterate the computation of S-derivable 
formulas until a fixed point is reached. 

Here is a small example illustrating the definition. 

Example 3: 

1 ) c <— not ~^c, a 

2 ) <— not c 

3) a 

Let 1 > 2. Clearly, rule 3 is 0-safe since there is no way of defeating a rule without 
default negation. But also 1) is 0-safe since the closure of 1 together with the 
0-undefeatable rule 3 defeats 2 and thus 1 0-dominates 2. Therefore the set of 
0-derivable literals is {c, a}. This set is already the least fixed point. 
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4 The Translation 

We use a straightforward modular translation Trans from defeasible theories 
T = (i?, >) to extended logic programs Trans{T) = {Trans{R),>') where 
Trans{R) = {Trans{r) : r S i?} and the translation of each rule is defined 
as follows: 

{ai, . . . ,Qn} ^ b becomes 6^ai,...,a„ 

{tti, . . . , a„} 6 becomes b ^ not —b,ai,...,a„ 

Furthermore, we require that Trans{r) >' Trans{r') iff r > r' . (in the rest of 
the paper we use the same symbol for the two preference relations because we 
don’t expect this to cause any confusion). 

The prioritized logic programs obtained this way are a proper subset of priori- 
tized extended logic programs which we call defeasible logic programs. Defeasible 
logic programs use default negation in a highly restricted way (corresponding to 
normal defaults in default logic). 

5 (In) correctness 

We first investigate correctness of defeasible logic wrt prioritized well-founded 
semantics, that is, the question whether for each defeasible conclusion +5p of a 
defeasible theory T we have p € WFS{Trans{T)). The answer for the general 
case will be no, but for a somewhat restricted case correctness can be established. 

The negative answer for the general case can be demonstrated by the follow- 
ing counterexample (we put the defeasible logic rules and their translation into 
the same line): 

Example 4- 

1 ) ^p 

2 ) p^ q 

3) ^ 

Assume 3 > 2. Now +5^q is a conclusion which can be established through the 
following derivation: 



p <— not ~^p 

q^p 
~^q <— not q 



—Ap, —Aq, +S^q 

Well-founded semantics, on the other hand, concludes q but not ^q: although 
3 has higher priority than 2 it does not dominate 2 since a strict rule can never 
be defeated. From this we have the following proposition: 

Proposition 1. Defeasible logic is incorrect wrt prioritized well-founded seman- 
tics: there is a defeasible theory T = (i?, >) and a formula q such that -\Sq is a 
consequence of T hut q ^ W F S{Trans{T)) . 
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The example already hints at the source of the incorrectness. In defeasible 
logic a strict rule can be overridden by a defeasible rule with higher priority. This 
can never happen in well-founded semantics where the conclusion of a strict rule 
is accepted whenever its antecedents are accepted no matter what the preferences 
are. Indeed, a restriction on the admissable preferences turns out to be sufficient 
for obtaining correctness. 

Proposition 2. Let T = (i?, >) be a defeasible theory such that > is defined on 
defeasible rules only, Trans(T) = (P, >) its translation. If +Sq is a conclusion 
ofT then q € WFS{Trans{T)). 

Proof. For the proof we show the following lemmata: Let S' = U S'! U . . . be 
the least fixed point reached for Trans{T), that is, So = 0 and for all i > 0, S^ 
is the set of Si_i-derivable literals. Then the following results hold: 

Lemma 1. If +Aq is a conclusion ofT, then g € Si (and henceforth in all Sj 
for j >1). 

Lemma 2. If +Sq is a conclusion ofT, then q G Si for some i (and henceforth 
in all Sj for j > i). 

Lemma 3. If —aq is a conclusion ofT, then q is not an S-potential defeater, 
that is there is an i such that q is not an Si- potential defeater (and henceforth 
not an Sj-potential defeater for all j >i). 

Note that our proposition is equivalent to Lemma 2. Lemma 1 is immediate since 
strict rules can never be defeated. 

The proof for Lemmas 2 and 3 is by joint induction on the length n of the 
shortest proof of the corresponding tagged literals. The base case can be checked 
easily. For the inductive step we assume that Lemmas 2 and 3 hold for tagged 
literals whose shortest proofs are of length at most n — 1. We have to distinguish 
2 cases representing the possible tagged literals appearing in the lemmata: 

case -\-5q: 

there are two alternatives 

a) -\-Aq appears in the proof before +5q, then according to Lemma 1, q is already 
in Si and thus in S, or 

b) there is a rule r with head q whose antecedents are, by induction hypothesis, 
in Sj for some j, and for all conflicting rules r': an antecedent is, by induction 
hypothesis, not an Sfe-potential defeater, for some k, or r' is Sj dominated by r. 
Hence r is 5m-safe for sufficiently large m and thus q G S. Note that for domi- 
nation to hold we need the fact that r' is a defeasible rule, otherwise r could not 
dominate r' . 

case —aq: 

we know that for each rule r with head q one of the following 2 alternatives 
holds: 

a) there is an antecedent which is, by induction hypothesis, not an S'j-potential 
defeater for some j, or 
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b) there is a conflicting rule s whose antecedents are, by induction hypothesis, 
already in S',, for some i, and which S'^-dominates r (since s > r rule r must 
be defeasible and thus domination follows from the fact that the rules have 
complementary heads) . 

Let k be the smallest integer such that for each rule r with head q an an- 
tecedent of r is not an Sfe-potential defeater (case a) or r is Sfe-dominated by a 
rule with complementary head (case b) . Let D be the set of rules with head q for 
which case b) holds but not case a). There are two possibilities: if D is empty 
then q is not a potential S'fc-defeater. If D is not empty then, since > is acyclic, 
there must be a rule r' among the rules S'fc-dominating elements of D which is 
itself not S'fc-dominated and thus Sk-sale. Sk+i therefore contains —q, all rules 
for which case b) holds are thus S'fe+i-defeated and q is not an Sfe+i-potential 
conclusion. 



6 Incompleteness 

We next discuss completeness. It turns out that defeasible logic is incomplete 
wrt prioritized well-founded semantics: 

Proposition 3. Let T = (i?, >) he a defeasible theory, and Trans{T) = (P, >) 
its translation, q G WFS{Trans(T)) does not imply that +Sq is a conclusion 
ofT. 

To prove the proposition we will discuss some counterexamples which also illus- 
trate the sources of the incompleteness. 

Here is a first counterexample: 

Example 5: 

1 ) <— not p 

2) P ^ P P ^ P 

Assume there are no preferences. There is no proof for +6^p. Although the 
conflicting rule 2 can never be used to derive p, the mere existence of the rule is 
regarded as sufficient reason not to conclude ~^p. Well-founded semantics, on the 
other hand, concludes ^p: p is not a potential 0-defeater, 1 is thus 0-undefeatable 
and used to derive ~^p. Well-founded semantics thus implicitly performs the kind 
of loop checking which is lacking in defeasible logic. 

For the next counterexample consider again the rules of Example 4: 

Example 6: 

1) ^p 

2) p^ q 

3) ^ ~^q 

This time we assume no priorities. Clearly q S WFS(Trans(T)) but +6q is 
not a conclusion of T. This illustrates a major difference in the way strict rules 



p <— not ~^p 

q^p 
^q <— not q 
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are treated. In well-founded semantics a strict rule is applied whenever its an- 
tecedents are accepted, independently of whether the antecedents are derived 
using strict or defeasible rules. In defeasible logic strict rules have a different 
role, depending on whether all antecedents have strict proofs or not. If this is 
the case, then the rule is applied. If one of the antecedents is only defeasibly 
derivable then the strict rule is treated like a defeasible rule and may be blocked 
by a conflicting defeasible rule, as in our example. 

Is this ambivalent role of rules adequate? In other words, is well-founded 
semantics sometimes not cautious enough? We do not think so. As the authors 
of [3] point out “strict rules are intended to define relationships that are defi- 
nitional in nature”. The example they give is emu{X) —y bird{X). If there is 
a definitional relationship between emus and birds it seems fully adequate to 
accept, say bird{Tweety) if emuiTweety) is accepted. The additional conclu- 
sions obtained by well-founded semantics in examples like the one above seem 
perfectly reasonable. 

The next example shows that well-founded semantics takes more preferences 
into account than defeasible logic. 

Example 1: 

1 ) 

2 ) 

3) ^ q 

4) q^^p 

Assume 1 > 3. Defeasible logic does not conclude +6p. The reason is that only 
preferences of rules with complementary heads play a role in the proof theory 
of defeasible logic. Preferences among other rules are simply disregarded. Well- 
founded semantics, on the other hand, concludes p in the example. Rule 1 is 
0-safe since the closure of 1 together with the 0-undefeatable rule 2 defeats 3, 
that is 1 0-dominates 3. 

Again we believe that the additional conclusions obtained by well-founded 
semantics are perfectly reasonable. 

7 Conclusions 

In this paper we have analyzed the relationship between defeasible logic and 
well-founded semantics for prioritized extended logic programs with two types 
of negation. For the comparison we used a straightforward modular translation 
of defeasible theories to extended logic programs. The analysis was based on the 
arguably most attractive variant of defeasible logic, the ambiguity propagating 
defeasible logic presented in [3] . The prioritized well-founded semantics we used 
is a considerably simplified version of a semantics proposed in [6] . The simplifi- 
cation was possible since for the purpose of this paper the ability to represent 
preference information in the logical language was not essential. 

It turned out that, although correctness does not hold in general, a minor 
restriction is sufficient to guarantee correctness: if we admit preferences between 



p <— not^p 

-^q^p 
q <— not^q 

q 



On the Relationship between Defeasible Logic 



131 



defeasible rules only, then all defeasibly provable literals are true in prioritized 
well-founded semantics. It should be mentioned that the use of the ambiguity 
propagating variant of defeasible logic without team defeat clearly is essential 
for this result. Nute’s original version is obviously incorrect (see Example 1) as 
any other variant without ambiguity propagation. 

We also analyzed the sources of the incompleteness of defeasible logic. It 
turned out that three factors contribute to the incompleteness: 1) the lack of 
loop checking, 2) the somewhat ambivalent role of strict rules which - so to 
speak - turn into defeasible rules if not all antecedents are strictly provable, and 
3) the preference handling which completely neglects preferences between rules 
which do not have complementary literals. 

From a semantical point of view well-founded semantics seems to have clear 
advantages: the additional conclusions obtained seem perfectly reasonable. More- 
over, in comparison with the complex rules of defeasible logic the definition of 
well-founded semantics is quite simple and elegant. Finally, the semantics is de- 
fined for a much larger class of programs than those obtained by translating 
defeasible theories. 

What remains is the computational aspect. In both approaches the compu- 
tation of conclusions is polynomial in the size of the rule base. In [6] the variant 
of well-founded semantics where preference information is expressed in the lan- 
guage is reported to be of cubic complexity. In [12] Nute’s defeasible logic is 
reported to be of linear complexity. It remains an issue for further study how 
these results transfer to the variants discussed in this paper, and whether there 
are applications where a possible computational advantage of defeasible logic 
can outweigh its semantical disadvantages. 
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Abstract. Much work has been done on extending the well-founded se- 
mantics to general disjunctive logic programs and various approaches 
have been proposed. However, no consensus has been reached about 
which semantics is the most intended. In this paper we look at disjunc- 
tive well-founded reasoning from different angles. We show that there 
is an intuitive form of the well-founded reasoning in disjunctive logic 
programming which can be equivalently characterized by several differ- 
ent approaches including program transformations, argumentation, un- 
founded sets (and resolution- like procedure). We also provide a bottom- 
up procedure for this semantics. The signihcance of this work is not 
only in clarifying the relationship among different approaches, but also 
in providing novel arguments in favor of our semantics. 



1 Introduction 

The importance of representing and reasoning about disjunctive information has 
been addressed by many researchers. Disjunctive logic programming (DLP) is 
widely believed to be a suitable tool for formalizing disjunctive reasoning and 
it has received extensive study in recent years. Since DLP admits both default 
negation and disjunction, the issue of finding a suitable semantics for disjunctive 
programs is more difficult than it is in the case of normal (i. e. non-disjunctive) 
logic programs. Usually, skepticism and credulism represent two major semantic 
intuitions for knowledge representation in artificial intelligence. The well-founded 
semantics [12] is a formalism of skeptical reasoning in normal logic programming 
while the stable semantics [6] formalizes credulous reasoning. Recently, consid- 
erable effort has been paid to generalize these two semantics to disjunctive logic 
programs. However, the task of generalizing the well-founded model to disjunc- 
tive programs has proven to be complex. There have been various proposals for 
defining the well-founded semantics for general disjunctive logic programs [8]. 
As argued by some authors (for instance [2,10,13]), each of the previous versions 
of the disjunctive well-founded semantics bears its own drawbacks. Moreover, no 
consensus has been reached about what constitutes an intended well-founded se- 
mantics for disjunctive logic programs. The semantics D-WFS [1,2], STATIC [10] 
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and WFDS [13] are among the most recent approaches to defining disjunctive 
well-founded semantics. D-WFS is based on a series of abstract properties and it 
is the weakest (least) semantics that is invariant under a set of program transfor- 
mations. STATIC has its root in autoepistemic logic and is based on the notion of 
static expansions for belief theories. The semantics STATIC(P) for a disjunctive 
program P is defined as the least static expansion of Paeb where Paeb is the 
belief theory corresponding to P. The basic idea of WFDS is to transform P into 
an argumentation framework and WFDS(P) is specified by the least acceptable 
hypothesis of P. Although these semantics stem from very different intuitions, all 
of them share a number of attractive properties. In particular, each of these se- 
mantics extends both the well-founded semantics [12] for normal logic programs 
and the generalized closed world assumption (GCWA) [9] for positive disjunctive 
programs (i. e. without default negation). 

It has been proven that D-WFS is equivalent to a restricted version of 
STATIC [3]. But the relation of these semantics to the argumentation-based 
semantics and unfounded sets are as yet unclear. In this paper, we modify some 
existing semantics to make them more intuitive and report further equivalence 
results. First, we define a transformation-based semantics denoted D-WFS* by 
introducing a new transformation into Brass and Dix’s set Twfs of program 
transformations. This semantics naturally extends D-WFS and enjoys all the 
important properties that have been proven for D-WFS. We prove that WFDS 
is equivalent to D-WFS*. We also provide a bottom-up evaluation procedure 
for WFDS (and D-WFS*). Second, we define a new notion of unfounded sets 
which is a generalization of the unfounded sets defined in [7,5]. Based on this 
new notion of unfounded sets, we define a well-founded semantics U-WFS for 
disjunctive programs. We show that U-WFS is equivalent to WFDS (and thus 
D-WFS*). Moreover, in [14] we have developed a top-down procedure D-SLS 
Resolution which is sound and complete with respect to our semantics. D-SLS 
extends both SLS-resolution [11] and SLI-resolution [8]. Altogether we obtain 
the following equivalence results: 

WFDS = D-WFS* = U-WFS = D-SLS. 

We consider these results to be quite significant: (1) Our results clarify the 
relationship among quite several different approaches to defining disjunctive 
well-founded semantics, including argumentation-based, transformation-based, 
unfounded sets-based and resolution-based approaches. (2) Since the four se- 
mantics are based on very different intuitions, these equivalent characterizations 
in turn provide yet more powerful arguments in favor of our semantics. (3) Both 
the top-down procedure D-SLS Resolution [14] and the bottom-up query eval- 
uation proposed in this paper pave two different ways for implementing our 
semantics. 

The rest of this paper is arranged as follows. In Section 2 we recall some 
basic definitions and notation; we present in Section 3 a slightly restricted form 
of the well-founded semantics WFDS. In Section 4 we introduce a new pro- 
gram transformation Head reduction and then define the transformation-based 
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semantics D-WFS*, which naturally extends D-WFS. In Section 5, we first pro- 
vide a bottom-up query evaluation for D-WFS* (and WFDS) and then prove 
the equivalence of D-WFS* and WFDS. Section 6 introduces the new notion of 
unfounded sets and defines the well-founded semantics U-WFS. We also show 
that U-WFS is equivalent to WFDS. Section 7 is our conclusion. Proofs of the 
theorems are given in the full version of this paper. 

2 Preliminaries 

We briefly review most of the basic notions used throughout this paper. 

A disjunctive logic program is a finite set of rules of the form 

ai\/ ■■■ y an ^ bi, ... not Cl, ... ,not Ct, (1) 

where ai,bi, Ci are atoms and n > 0. The default negation ^not a’ of an atom a is 
called a negative literal. In this paper we consider only propositional programs 
although many definitions and results hold for predicate logic programs. 

P is a normal logic program if it contains no disjunctions. 

If a rule of form (1) contains no negative body literals, it is called positive; P 
is a positive program if every rule of P is positive. 

If a rule of form (1) contains no body atoms, it is called negative; P is a 
negative program if every rule of P is negative. 

Following [2], we also say a negative rule r is a conditional fact. That is, a 
conditional fact is of form ai V • • • V a„ <— not Ci, • • • , not Cm, where and Cj 
are (ground) atoms for I < k < n and 0 < j < m. 

For a rule r of form (1), body{r) = body~^{r) U body~ [r) where body~^{r) = 
{bi, . . . , bm} and body~(r) = {not Ci, . . . , not ct}; head(r) = oi V • • • V a„. When 
no confusion is caused, we also use head{r) to denote the set of atoms in header). 
For instance, a G header) means that a appears in the head of r. If A is a set 
of atoms, header) — A is the disjunction obtained from header) by deleting the 
atoms in A. The set head{P) consists of all atoms appearing in rule heads of P. 

As usual, Bp is the Herbrand base of disjunctive logic program P, that is, the 
set of all (ground) atoms in P. A positive (negative) disjunction is a disjunction 
of atoms (negative literals) in P. A pure disjunction is either a positive one or 
a negative one. The disjunctive base of P is DB p = DBp U DBp where DBp 
is the set of all positive disjunctions in P and DBp is the set of all negative 
disjunctions in P. If A and P = A V A' are two disjunctions, then we say A is a 
sub- disjunction of B, denoted AC B. 

A model state of a disjunctive program P is a subset of DBp. Usually, a 
well-founded semantics for a disjunctive logic program is defined by a model 
state. 

If S is an expression (a set of literals, a disjunction or a set of disjunctions), 
atoms{S) denotes the set of all atoms appearing in S. 

For simplicity, we assume that all model states are closed under implication 
of pure disjunctions. That is, for any model state S, if A is a sub-disjunction of 



136 Kewen Wang 



a pure disjunction B and A € S', then B G S. For instance, if S = {a, b V c}, 
then a\J by c G S. 

Given a model state S and a pure disjunction A, we also say A is satisfied 
by S, denoted S |= A, if A e S. 

We assume that all disjunctions have been simplified by deleting the repeated 
literals. For example, the disjunction a V & V & is actually the disjunction a V 6. 

3 Argumentation and Well-Founded Semantics 

As illustrated in [13]^, argumentation provides an unifying semantic framework 
for DLP. The basic idea of the argumentation-based approach for DLP is to 
translate each disjunctive logic program into an argument framework Fp = 
(P, DBp,'^p). In the framework defined in [13], an assumption of P is a nega- 
tive disjunction of P, and a hypothesis is a set of assumptions; is an attack 
relation among the hypotheses. An admissible hypothesis A is one that can at- 
tack every hypothesis which attacks it. The intuitive meaning of an assumption 
not oi V • • • V not Qm is that ai A • • • A am can not be proven from the disjunctive 
program. 

Given a hypothesis A of disjunctive program P, similar to the GL-transfor- 
mation [6], we can easily reduce P into another disjunctive program without 
default negation. 

Definition 1. Let A he a hypothesis of disjunctive program P , then the reduct 
of P with respect to A is the disjunctive program 

Pa — {header) <— hody'^{r) | r G P and body~{r) C A}. 

The following definition introduces a special resolution h p which resolves default- 
negation literals with a disjunction. 

Definition 2. Let A be a hypothesis of disjunctive program P and A G DB'j,. Lf 
there exists B G DB~^ and not bi, . . . , not bm G A such that B = Av6i V • • • 
and P_^ 1= B. Then A is said to be a supporting hypothesis for A, denoted 
ArpA. Here |= is the inference relation of the classical propositional logic. 

The set of all positive disjunctions supported by A is denoted: 

consp{A) = {A G DBp \ AhpA}. 

To derive suitable hypotheses for a given disjunctive program, some constraints 
will be required to filter out unintuitive hypotheses. 

Definition 3. Let A and A' be two hypotheses of disjunctive program P. Lf at 
least one of the following two conditions holds: 

^ You et al in [15] also defined an argumentative extension to the disjunctive sta- 
ble semantics. However, their framework does not lead to an intuitive well-founded 
semantics for DLP as the authors have observed. 
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1. there exists (3 = not 6i V • • • V not bm € A', m > 0, such that Arpbi, for 
all i = 1, . . . , m; or 

2. there exist not 6i, . . . , not bm S A' , m > 0, such that A\-pbi V • • • V b^, 
then we say A attacks A' , and denoted A^p A' . 

Intuitively, A '^p A' means that A causes a direct contradiction with A' and 
the contradiction may come from one of the above two cases. 

Example 1. 



a\/ b ^ 

c d, not a, not b 
d^ 

e <— not e 

Let A' = {not c} and A = {not a, not 6}, then A'^p A' . 

The next definition specifies what is an acceptable hypothesis. 

Definition 4. Let A be a hypothesis of disjunctive program P. An assumption 
B of P is admissible with respect to A if A'^p A' holds for any hypothesis A' 
of P such that A' '^p {B}. 

Denote Ap{A) = {not oi V • • • V not Om S DBf, \ not Oi is admissible wrt 
A for some 1 < i < m}. 

Originally, Ap also includes some other negative disjunctions. To compare with 
different semantics, we omit them here. Another reason for doing this is that 
information in form of negative disjunctions does not participate in inferring 
positive information in DLP. 

For any disjunctive program P, Ap is a monotonic operator. Thus, if P is 
finite then Ap has the least fixpoint Ifp(Ap) and Ifp(Ap) = A%{flf) for some 
fc > 0. 

Definition 5. The well-founded disjunctive hypothesis WFDH{P) of disjunc- 
tive program P is defined as the least fixpoint of the operator Ap. That is, 
WFDH{P) = Ap{uj. 

The well-founded disjunctive semantics WFDS for P is defined as the model 
state WFDS{P) = WFDH{P) U consp{WFDH{P)). 

By the above definition, WFDS(P) is uniquely determined by WFDH(P). 

For the disjunctive program P in Example 1, WFDH(P) = {not c} and 
WFDS(P) = {a V &, d, not c}. Notice that e is unknown. 

4 Transformation-Based Semantics 

In this section we study the relation of the argumentation-based semantics to 
the transformation-based semantics. We first introduce a new program transfor- 
mation so as to simplify the rule heads of disjunctive programs and then define 
a new transformation-based semantics (called D-WFS*) as the most skeptical 
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semantics that satisfies both our new program transformation and Brass and 
Dix’s set Twfs of program transformations. Our new semantics D-WFS* nat- 
urally extends the D-WFS in [2] and is no less skeptical than D-WFS. In fact, 
this extension is meaningful because D-WFS seems too skeptical to derive useful 
information from some disjunctive programs as the next example shows. 

Example 2. John is traveling in Europe but we are not sure which city he is 
visiting. We know that, if there is no evidence to show that John is in Paris, 
he should be either in London or in Berlin. Also, we are informed that John is 
now visiting either London or Paris. This knowledge base can be conveniently 
expressed as the following disjunctive logic program P: 

by I <— not p 

ly p ■>— 

Here, 6 , 1 and p denote that John is visiting Berlin, London and Paris, respec- 
tively. 

Intuitively, not b (i. e. John is not visiting Berlin) should be inferred from P. 
It can be verified that neither b nor its negation not b can be derived from P 
under D-WFS and STATIC while not b can be derived under WFDS. 

The intuition behind Minker’s Generalized Closed World Assumption (GCWA) 
[9] can be read off its proof-theoretic characterization: 

If, for every positive disjunction A, P \- a y A implies P \- A, then not a is 
derivable from P, where F is the inference relation in the classical logic and P 
is considered as a classical logic theory. 

The above principle for positive DLP can be reformulated in general DLP as 
follows: 

If, for every conditional fact a V A <— not C, P h (a V A <— not C) implies 
P h (A <— not C), then not a is derivable from P, where F is the inference 
relation in the classical logic and P is considered as a classical logic theory. 

However, D-WFS does not obey the above principle as Example 2 shows. In 
fact, P F (6 V ^ not p) implies P F (Z <— not p) since Z V p <— is in P. But 
6 ^ D-WFS (P). 

According to [2], an abstract semantics can be defined as follows. 

Definition 6. A semantics S is a mapping which assigns to every disjunctive 
program P a set S(P) of pure disjunctions such that the following conditions are 
satisfied: 

1. if Q' is a sub- disjunction of pure disjunction Q and Q' € S{P), then Q € 

S{P); 

2. if the rule A is in P for a (positive) disjunction A, then A G S{P); 

3. if a is an atom and a ^ head{P) (i. e. a does not appear in the rule heads 
of P), then not a G S(P). 

It should be noted that a semantics satisfying the above conditions is not nec- 
essarily a suitable one because Definition 6 is still very general. 
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Besides the program transformations T\vfs in [2], we also need a new pro- 
gram transformation called Head reduction to define our semantics. This defi- 
nition is designed just to reflect the semantic intuition behind the GCWA as 
mentioned at the beginning of this section. 

Definition 7. An atom a in disjunctive program P is called GCWA-negated if, 
for any rule r in P of form a\/ A *— B, not Ci, . . . , not Ct, there is a rule A! <— 
in P such that A! is a sub-disjunction o/ A V ci V • • • V c*. 

For instance, b can be GCWA-negated for the disjunctive program in Example 2. 



Definition 8. A rule r is an implication of another rule r' if head(r') C head(r), 
bodyfr') C body(r) and at least one inclusion is proper. 

The definition of our new semantics D-WFS* will be based on the set Twfs of the 
following six program transformations. In the sequel, Pi and P 2 are disjunctive 
programs: 

~ Unfolding: P 2 is obtained from Pi by unfolding if there is a rule A <— 
b, B, not C in Pi such that 

P2 = Pi - {A ^ b, B, not C} 

U{A V (A' - {&}) ^ P, B', not C, not C') | 

there is a rule of Pi : A' ^ B' , not C' such that b G A'}. 

~ Elimination of tautologies: P2 is obtained from Pi by elimination of 

tautologies if there is a rule A <— P, not C in Pi such that A n P yf 0 
and P 2 = Pi — {A ^ P, not C}. 

— Elimination of nonminimal rules: P2 is obtained from Pi by elimination 
of nonminimal rules if there are two distinct rules r and r' of Pi such that r 
is an implication of r' and P 2 = Pi — {r}. 

~ Positive reduction: P2 is obtained from Pi by positive reduction if there 
is a rule A <— P, not C in Pi and c £ C such that c ^ head{Pi) and P2 = 
Pi — {A <— P, not C} U {A <— P, not {C — {c})}. 

— Negative reduction: P2 is obtained from Pi by negative reduction if there 
are two rules A ^ P, not C and A' ^ in Pi such that A' C C and P2 = 
Pi — {A ^ P, not C}. 

~ Head reduction P2 is obtained from Pi by head reduction if there is a rule 
a V A ^ P, not C in Pi such that a is GGWA-negated and P2 = Pi U {A ^ 
P, not C} — {a V A <— P, not C}. 

Example 3. Gonsider the disjunctive program P in Example 2. Since the atom b 
is GGWA-negated, P can be transformed into the following disjunctive pro- 
gram P' by Head reduction: 

I <— not p 

l\J p ^ 

Suppose that 5 is a semantics. Then by Definition 6, IV p £ S and not b £ S. 
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We say a semantics S satisfies a program transformation T (or, S is invariant 
under T) if 5 (Pi) = 5(P2) for any two disjunctive programs Pi and P 2 with P 2 = 
P(Pi). 

Let S and S' be two semantics. S is weaker than S' if S{P) C S'{P) for any 
disjunctive program P. 

We present the main definition of this section as follows. 

Definition 9. (D-WFS* ) The semantics D-WFS* for disjunctive programs is 
defined as the weakest semantics allowing all program transformations in 

This definition is not constructive and thus it can not be directly used to compute 
the semantics D-WFS* (a bottom-up procedure will be given in the next section). 
In the rest of this section, we first look at some properties of D-WFS*. 

As the following theorem shows, D-WFS* (P) is well-defined for every dis- 
junctive program P. This is guaranteed by the following two lemmas. 

Lemma 1. There is a semantics that satisfies all the program transformations 
in 

Lemma 2. Let S± and S 2 be two semantics satisfying Then their inter- 

section S = S\T S 2 is also a semantics and satisfies "^*WFS- 

Therefore, we have the following result which shows that semantics D-WFS* 
assigns the unique model state D-WFS* (P) for each disjunctive program P. 

Theorem 1. For any disjunctive program P, D-WFS* (P) is well-defined. 

Since the set T\vfs of program transformations in [2] is a subset of T^pg, our 
D-WFS* extends the original D-WFS in the following sense. 

Theorem 2. Let P be a disjunctive program. Then 

D-WFS{P) C D-WFS* {P). 

The converse of Theorem 2 is not true in general. As we will see in Section 5, for 
the disjunctive program P in Example 2, not b G D-WFS* (P) but not b ^ 
D-WFS (P). This theorem also implies that D-WFS* extends the restricted 
STATIC since the D-WFS* is equivalent to the restricted STATIC [3]. 

5 Bottom-Up Computation 

Parallel to the computation for D-WFS [2], we will first provide a bottom-up 
procedure for D-WFS* and then show the equivalence of D-WFS* and WFDS. 
As a result, we actually provide a bottom-up computation for WFDS. 

Let P be a disjunctive program. Our bottom-up computation for D-WFS* (P) 
consists of two stages. At the first stage, P is equivalently transformed into a 
negative program Lft(P) called the least fixpoint transformation. The details of 
this transformation can be found in [2,13]. The basic idea is to first evaluate body 
atoms of the rules in P but delay the negative body literals. The second stage 
is to further simplify Lft(P) into res*(P) from which the semantics D-WFS* (P) 
can be directly read off. 
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5.1 Strong Residual Program 

In general, the negative program Lft(P) can be further simplified by deleting 
unnecessary rules, unnecessary body literals and unnecessary head atoms. This 
leads to the idea of so-called reductions, which was firstly studied in [4] and then 
generalized to the case of disjunctive logic programs in [2]. The reduction of a 
disjunctive program P is called the residual program of P. The following is a 
generalization of Brass and Dix’s residual programs. 

Let iVccwA be the set of atoms that are GCWA-negated in disjunctive pro- 
gram N. The reduction operator R* is defined as, for any negative program N 
(i. e. a set of conditional facts), 

R*{N) = { (A — a) ^ not (C n head{N)) \ 

there is rule r G N : A ^ not C such that 

(1) no rule of form (A' with A' C C, 

(2) no rule r' s.t. r is an implication of r' , 

(3) a G A^gcwa and A — TVgcwa ^ 0}- 

The notion of the implication of rules can be found in Definition 8. For any 
disjunctive program P, we can first transform it into the negative disjunctive 
program Lft(P). Then, fully perform the reduction R* on Lft(P) to obtain a 
simplified negative program res*{P) (the strong residual program of P). The 
iteration procedure of R* will finally stop in finite steps because Bp contains 
finite number of atoms and the total number of atoms occurring in each N is 
reduced by R*. This procedure is precisely formulated in the next definition, 
which has the same form as Definition 3.4 in [2] (the difference is only in that 
we have a new reduction operator R* here). 

Definition 10. (strong residual program) Let P he a disjunctive program. Then 
we have a sequence of negative programs {fVi}i>o with Nq = Lft{P) and = 
R*{Ni). Let Nt+i = R*{Nt). Then we call Nt is the strong residual program of P 
and denote it as res*{P). 

Since the Head reduction has been directly embedded into the operator R* , the 
following result can be obtained from Theorem 4.3 in [2], which guarantees the 
completeness of our bottom-up computation. 

Theorem 3. Let P and P' be two disjunctive programs. Lf P is transformed 
into P' by a program transformation in then res*{P) = res*{P'). 

This theorem has the following interesting corollary. 

Corollary 1. Let S be a semantics satisfying S{P) = S{res*{P)) for all dis- 
junctive program P. Then S allows all program transformations in "^*WFS- 

This corollary implies that, if Sq is a mapping from the set of all strong residual 
programs to the set of model states and it satisfies all properties in Definition 6, 
then the mapping defined by S{P) — S{res*{P)) is a semantics. Therefore, the 
following lemma is obtained from the fact that D-WFS* is the weakest semantics. 
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Lemma 3. Given disjunctive program P, we have 

D- WPS* {res* {P)) = D-WFS*+{P) U D-WFS*_{P) 

where 

D-WFS*+{res*{P)) = {A e DB+ \ rule A' ^ is in res*(P) 

for some sub-disjunction A' of A} 

D-WFS*_{res*{P)) = {A G DBf \ if head{res* {P)) 

for some atom a appearing in A.} 

Thus, for any disjunctive program P, it is an easy task to get the semantics 
D-WFS*(res*(P)) of its strong residual program. 

The main theorem in this section can be stated as follows. 

Theorem 4. For any disjunctive program P , we have 

D-WFS*{P) = D-WFS*_^{P) U D-WFS*_{P) 

where 

D-WFBX{P) = {A€ DB^ I rule A' ^ is in res*{P) 

for some sub -disjunction A' of Aj 

D-WFSl{P) ={Ae DBp \ifa(f head{res* {P)) 

for some atom a appearing in A.} 

Example j. Consider again the disjunctive program P in Example 2. The strong 
residual program res*{P) is as follows: 

I <— not p 
IV p ■>— 

Thus, D-WFS*(P) = {1 V p, not b} 

5.2 Equivalence of WEDS and D-WFS* 

Before we present the main theorem of this section, we need some properties of 
WFDS. First, we can justify that WFDS is a semantics in the sense of Defini- 
tion 6. Moreover, it possesses the following two properties which can be verified 
directly. 

Proposition 1. WFDS satisfies all program transformations in 

^ D-WFS* (P) should include all pure disjunctions implied by either 1 V p or not b. 
However, the little abusing of notion here simplifies our notation. 
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This proposition implies that the argumentation-based semantics WFDS is al- 
ways at least as strong as the transformation-based semantics D-WFS* . 

The next result convinces that the strong residual program res*{P) of dis- 
junctive program P is equivalent to P w.r.t. the semantics WFDS. Therefore, 
we can first transform P into res*{P) and then compute WFDS(res*(P)). 

Proposition 2. For any disjunctive program P , 

WFDS{P) = WFDS{res*{P)). 

It has been shown in [2] that Lft and their reduction operator R can be simulated 
by Twfs = T^pg — {Head reduction}, we have that Lft and R* can be simulated 
by T{ypg. Thus, the above proposition holds. 

Now we can state the main result of this section, which asserts the equivalence 
of D-WFS* and WFDS. 

Theorem 5. For any disjunctive logic program P , 

WFDS{P) = D-WFS" {P). 

An important implication of this result is that the well-founded semantics WFDS 
also enjoys a bottom-up procedure similar to the D-WFS. 



6 Unfounded Sets 



The first definition of the well-founded model [12] is given in term of unfounded 
sets and it has been proved that the notion of unfounded sets constitutes a 
powerful and intuitive tool for defining semantics for logic programs. This notion 
has also been generalized to characterizing stable semantics for disjunctive logic 
programs in [7,5]. However, the two kinds of unfounded sets defined in [7,5] 
can not be used to define an intended well-founded semantics for disjunctive 
programs. 



Example 5. ^ 




not a, not b 



Intuitively, not c should be derived from the above disjunctive program and actu- 
ally, many semantics including DWFS, STATIC and WFDS assign a truth value 
^ false'' for c. However, according to the definitions of unfounded sets in [7,5], c is 
not in any n-fold application of the well-founded operators on the empty set. For 
this reason, a more reasonable definition of the unfounded sets for disjunctive 
programs is in order. 

In this section, we will define a new notion of unfounded sets for disjunc- 
tive programs and show that the well-founded semantics U-WFS defined by our 
notion is equivalent to D-WFS* and WFDS. 

® This example is due to Jurgen Dix (personal commnnication) 
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We say body{r) of r € P is true wrt model state S, denoted S \= body{r), if 
body{r) C S'; body{r) is false wrt model state S, denoted S ^ ^body{r) if either 
(1) the complement of a literal in bodyir) is in S or (2) there is a disjunction a\ V 
V Qn G S such that {not ai, . . . , not a„} C body{r). 

In Example 5, the second rule is false wrt S = {a V 6}. 

Definition 11. Let S be a model state of disjunctive program P , a set X of 
ground atoms is an unfounded set for P wrt S if, for each a £ X and each rule 
r £ P such that a £ head(r), at least one of the following conditions holds: 

1. the body of r is false wrt S; 

2. there is x £ X such that x £ body^{r); 

3. if S \= body{r), then S \= (header) — X). 

Notice that the above definition generalized the notions of unfounded sets in [7,5] 
in two ways. Firstly, the original ones are defined only for interpretations (sets 
of ground literals) rather than for model states. An interpretation is a model 
state but not vice versa. Secondly, though one can redefine the original notions of 
unfounded sets for model states, such unfounded sets are still too weak to capture 
the intended well-founded semantics of some disjunctive programs. Consider 
Example 5, let S = {aV6}. According to definition 11, the set {c} is an unfounded 
set of P wrt S, but {c} is not an unfounded set in the sense of Leone or Eiter. 

Having the new notion of unfounded sets, we are ready to define the well- 
known operator Wp for any disjunctive program P. 

If P has the greatest unfounded set wrt a model state, we denote it Up{S). 
However, Up{S) may be undefined for some S. For example, let P = {a V 6} and 
S = {a, 6}. Then Xi = {a} and X 2 = {b} are two unfounded sets wrt S but 
X = {a, b} is not. Here we will not discuss the operator Up{S) in detail. 

Definition 12. Let P be a disjunctive program, the operator Tp is defined as, 
for any model state S , 

Tp{S) = {A £ DBp I there is a rule r £ P ■. AM a\\/ ■ ■ ■ M an ^ bodyfr) 
such that S ^ body(r) and not ai , . . . , not a„ G S}. 

Notice that Tp{S) is a set of positive disjunctions rather than just a set of atoms. 

Definition 13. Let P be a disjunctive program, the operator Wp is defined as, 
for any model state S, 

Wp{S) = Tp{S) U not.Up{S). 
where not.Up{S) = [not p\p £ Up{S)}. 

In general, Wp is a partial function because there may be no greatest unfounded 
set wrt model state S as mentioned previously. 

However, we can prove that Wp has the least fixpoint. Given a disjunctive 
program P, we define a sequence of model states [Wkjk^M where Wq = 0 
and Wk = Wp{Wk-i) for k > 0. 

Similar to Proposition 5.6 in [7], we can prove the following proposition. 



A Comparative Study of Well-Founded Semantics 



145 



Proposition 3. Let P be a disjunctive program. Then 

1. Every model state Wk is well-defined and the sequence {Wk\k^M is in- 
creasing. 

2. the limit Ufc>oi^fe of the sequence {Wk\k^M is the least fixpoint ofWp. 

Since we consider only finite propositional programs in this paper, there is some 
t >0 such that Wt = Wt+i. 

The well-founded semantics U-WFS is defined by 

U-WFS(P) = Ifp(Wp). 

For the program P in Example 5, U-WFS(P) = {a V 6, not c}. 

An important result is that WFDS (and thus D-WFS*) can also be equiva- 
lently characterized in term of the unfounded sets defined in this section. 

Theorem 6. For any disjunctive program P , 

WFDS{P) = U-WFS{P). 

Theorem 6 provides further evidence for suitability of WFDS (equivalently, 
D-WFS*) as the intended well-founded semantics for disjunctive logic programs. 
By the following lemma, we can directly prove Theorem 6. 

Lemma 4. Let P he a disjunctive program. Then Wk = Sk for any k > 0. 

This lemma also reveals a kind of correspondence between the well-founded 
disjunctive hypotheses and the unfounded sets. 

7 Conclusion 

In this paper we have investigated recent approaches to defining well-founded 
semantics for disjunctive logic programs. We first provided a minor modification 
of the argumentative semantics WFDS defined in [13]. Based on some intu- 
itive program transformations, we proposed an extension to the D-WFS in [2]. 
In our approach, we introduce a new program transformation called Head re- 
duction. This transformation plays a similar role in DLP as the GCWA [9] in 
positive DLP. We have also given a new definition of the unfounded sets for dis- 
junctive programs, which is a generalization of the unfounded sets investigated 
by [7,5]. This new notion of unfounded sets fully takes disjunctive information 
into consideration and provides another characterization for disjunctive well- 
founded semantics. The main contribution of this paper is the equivalence of 
U-WFS, D-WFS and WFDS. We have also provided a bottom-up computation 
for our semantics. A top-down procedure is presented in [14], which is sound 
and complete with respect to our semantics. These results show that there ex- 
ists a disjunctive well-founded semantics which can be characterized in terms 
of argumentation, program transformations, unfounded sets and resolution. The 
fact that different starting points lead to the same semantics provides strong 
support for WFDS. Future work will concentrate on more efficient algorithms 
and applications. 
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Abstract. This paper motivates and introduces entailment problems 
over nonmonotonic theories some of whose predicates — called open pred- 
icates — are not (completely) specified. More precisely, we are interested 
in those inferences that hold for some or all possible axiomatizations of 
the open predicates. Since a complete specification of an open predicate 
may model incomplete knowledge about the world, this kind of inference 
should distinguish missing object-level knowledge from missing parts of 
the specification, and restrict nonmonotonic inference accordingly. We 
formalize some interesting forms of such open entailment problems, and 
provide formal proof techniques for some of them in a logic programming 
framework. 



1 Introduction 

In this paper we tackle the problem of deciding whether a given formula is 
entailed by a nonmonotonic theory which has not been completely specified. 
The motivation for this work stems from several applications areas, including 
the following: 

~ Agent programs verification. Given a logic-based agent — such as an IMPACT 
agent [13] — it may be necessary to verify its correct behavior by proving 
that certain actions will never be executed, or that some action will surely 
be taken under given circumstances. The agent’s actions are determined by 
entailment from a logic program whose details cannot be fully specified at 
verification time (e.g., the precise definition of the agent’s beliefs and goals 
would most likely be unavailable). 

~ Reasoning about actions and change when the effects of some actions, or the 
causal links between certain fluents, have not been specified (e.g., because 
they have not yet been identified). 

~ Security policy verification. Security policies are often modelled and specified 
by means of nonmonotonic theories, either directly [14,12] or indirectly, by 
translating the specifications into logic programs with negation [2,4]. Part 
of the security policy may be unknown [4], e.g., because it is to be decided 
by a different organization, or because it is subject to changes. Thus, some 
of the predicates in the corresponding logic program are undefined at policy 
design time. Policies should be verified by proving that certain authorizations 
will/will not be granted (i.e., certain atoms will/will not be derivable), no 
matter how the missing details are filled in (see [4] for further details). 
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In all these examples, standard nonmonotonic semantics would treat missing 
predicates as if they were false for all arguments. Clearly, this is not appropri- 
ate for the above reasoning tasks. One should rather consider all the possible 
complete definitions of those predicates. More generally, if a predicate is par- 
tially specified, all the complete definitions compatible with the available details 
should be considered. In classical logic, this would be equivalent to proving that 
a certain formula is a logical consequence of the incomplete specification. In 
a nonmonotonic setting, we must identify hybrid inference mechanisms, that 
lie somewhere in between classical and nonmonotonic deduction. In particular, 
negation as failure should not be applied to any predicate whose definition is 
not complete. 

We start a formal investigation of these aspects by focussing on normal 
logic programs under the stable model semantics (that underlies — more or less 
explicitly — all the aforementioned verification problems). Using existing termi- 
nology [6] , by open program we mean a normal logic program whose domain and 
predicates are not completely specified. Section 3 formalizes open programs and 
some related, interesting inference problems. Section 4 introduces provably sound 
and complete techniques for solving some of those problems, under suitable as- 
sumptions. These techniques are based on the skeptical resolution calculus — that 
can handle open domains — which is recalled in Section 2. Section 6 concludes 
the paper with a list of interesting open problems and some related work. 

2 Preliminaries 

We assume the reader to be familiar with the standard notation and results on 
logic programming [10] and the stable model semantics [8]. 

Let metavariable P range over normal logic programs, and let Ground(P) 
denote the ground instantiation of P. We recall that a support of a ground 
atom A from P is a set of negative literals obtained by recursively unfolding A 
and its positive subgoals in Ground(P), until only negative literals are left. 

In the main part of the paper, the skeptical resolution calculus introduced 
in [3] will be adapted to open entailment. In the rest of this section we recall the 
basic definitions. 

A ground countersupport for a ground atom A from P is a set of positive 
literals K such that: 

1. each B G K is the complement of some literal belonging to a support of A 
from P; 

2. conversely, each support of A from P contains a literal whose complement 
is in K. 

A (nonground) countersupport of an arbitrary atom A from P is a pair (K,0) 
such that for all ground instances A9a, Ka is a ground countersupport of AOa. 

The skeptical resolution calculus is formulated independently of any specific 
mechanism for computing negation as failure. Such mechanism is abstracted by 
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a function CounterSupp that maps each atom A onto a (possibly empty) set of 
nonground countersupports for A. 

Let P be an arbitrary given program. A ( simple ) goal is a finite sequence of 
literals. A goal with hypotheses (h-goal for short) is a pair (G | H), where G is a 
simple goal and iL is a multiset of (positive or negative) literals called hypotheses. 
Roughly speaking, the answer to a query (G | H) should be yes if G holds in all 
the stable models that satisfy H . Finally, a skeptical goal (s-goal for short) is a 
finite sequence of h-goals; the empty sequence is denoted by □. 

A skeptical derivation from P and CounterSupp with restart goal Gq is a 
(possibly infinite) sequence of s-goals Go,Gi, ■ ■ ■ , where each Gi+i is obtained 
from Gi through one of the following rewrite rules {P and A are sequences of 
h-goals).^ 

Resolution. This rule may take two forms; a literal can be unified with either 
a program rule or a hypothesis. First suppose that Li is an atom, A ^ 
Bi, , Bk is a standardized apart variant of a rule of P, and 6 is the mgu 
of Li and A. Then the following is an instance of the rule. 

P {Li . . . Li-i, Li, Li+i . . . Ln I H) A 
[P (Li . . . L,_i, Ri, . . . , Bk, L,+i ...L„\H) A]0- 

Secondly, let Li be a (possibly negative) literal, let L' be a hypothesis, and 
let 9 be the mgu of Li and L' . Then the following is an instance of the rule. 

P (Li . . . Li-i, Li, Li_|_i . . . Ln \ H, L') A 
[P (Li...T,_i, L,+i...Ln I H,U) A]9 ' 

Failure. Suppose that Li = ^A, and {{Bi, . . . , Bk},9) € CounterSupp(A). 
Then the following is an instance of the Failure rule. 

P {Li . . . Li-i, Li, . . . Ln I H) A 
[P (Li . . . T,_i, Ri, . . . , Bk, L,+i ...Ln\H) A]9' 

Contradiction. This rule tries to prove (G | H) by showing that H cannot be 
satisfied by any stable model of P. 

P{G\H,L)A 
P {L\H,L) A' 

Split. Essentially, this rule is needed to compute floating conclusions and dis- 
cover contradictions. It splits the search space by introducing a new hy- 
pothesis. Let Go be the restart goal, L be an arbitrary literal and cr be 
the composition of the mgus previously computed during the derivation; the 
Split rule is: 

P (G\H) A 

P {G\H,L) {Goa\H,L)A' 

The h-goals (G | H, L) and (Gocr | H, L) are called restart h-goals. 



^ The restart goal Go will be needed in the Splitting rule below. 
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Success. 

r {n\ H) A 
Ta ■ 

A skeptical derivation is successful if the last s-goal is □ ; in this case we 
say that the first s-goal Qq has a successful skeptical derivation from P. As 
usual, the composition of the mgus computed during the derivation, restricted 
to the variables of Qq , is called answer substitution. Skeptical resolution is sound 
and complete w.r.t. the skeptical stable model semantics, under a completeness 
assumption over CounterSupp (see [3] for further details). 

There exist derivation strategies that restrict the application of the split 
rule. Such strategies are strictly goal-directed for call-consistent programs. A 
prototype implementation based on a semi-naive metainterpreter has been im- 
plemented in XSB Prolog (http : //xsb . sourcef orge . net). (Further details can 
be found in the journal version of [3].) 

3 Open Programs and Open Entailment 

In order to avoid ill-formed, possibly paradoxical definitions, assume two fixed, 
infinite sets of function and predicate symbols are given, and denote them with 
Func and Pred, respectively (as usual, constant symbols are identified with 0-ary 
functions). Moreover, let Var be an infinite set of variable symbols (following 
Prolog’s conventions, they will be denoted with uppercase letters). ^From now 
on, we shall consider only normal logic programs built from these sets (when we 
write “for all programs” or “there exists a program” we implicitly restrict the 
quantification accordingly) . 

Intuitively, an open program is a partially specified program P. Some pred- 
icates, called “open predicates”, are not completely specified in P, in the sense 
that their definition might be completely missing, or it might contain only some 
of the rules that define the predicate. The set of open predicates will be identified 
with a set of symbols O C Pred. Moreover, the missing clauses might contain 
function symbols that do not appear in P. Such symbols are listed in a set 
F C Func. 

Definition 1 (Open program). An open program is a triple {P,F,0) where 
P is a normal logic program, F C Func, and O C Pred. The symbols in F should 
not occur in P (while the symbols in O may occur in P). 

For any given open program {P, F,0) , an open atom (resp. literal) is any atom 
(resp. literal) whose predicate belongs to O. 

The next definition models all the possible ways of filling in the missing 
details of an open program. 

Definition 2 (Open program completions). Let L2 = {P, F,0) be an open 
program. A normal program P' is a completion^ of fl if the following conditions 
hold: 

^ The word “completion”, referred to normal programs, traditionally denotes Clark’s 
completion [10]. Unfortunately, the author could not find any suggestive alternative. 
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1. P' D P; 

2. the function symbols occurring in P' but not in P belong to F; 

3. for all r £ P' \ P , the predicate symbol in the head of r belongs to O. 

The set of all possible completions of will be denoted by Comp(l7) or, equiva- 
lently, by Covnp{P,F,0). 

Example 3. Consider an open program 17 with 

P={p{a,X)^^q{X)}, 

F={b}, 

0 = {q}. 

Some of the completions in Comp(l7) are: 

Pi = PU{q{a)} , 

P 2 = PU{q{b)}. 

P 3 = PU{q{X)^^p{X,Y)}, 

P 4 = PU {q{b) ^ ^q(b)} . 

These programs differ in many respects. The Herbrand domain of P\ and P 3 
coincides with the domain of P, while the domain of P2 and P4 is extended 
with b. Programs Pi and P2 are stratified while P3 and P4 are not. Program P3 
has two stable models, while P4 has no stable models. □ 

In the context of a given open program {P, F,0), a ground literal is a 
variable-free literal belonging to the language of some P' G Comp(P, P, O) — 
or, equivalently, any ground literal built with the symbols occurring in P, F 
and O. 

We are ready to formalize entailment from open programs. In the following, 
by consistent program we mean a normal logic program with at least one stable 
model. 

Definition 4 (Open inference). For all open programs 17 = {P,F,0) and 
all first-order sentences \P , 

1. (Credulous open inference) 17 |=‘^ F iff for some P' G Comp(J7), P' credu- 
lously entails F . 

2. (Skeptical open inference) f2 F iff for all P' G Comp(I7), P' skeptically 
entails F. 

3. (Mixed open inference I) 17 |=“® F iff for some consistent P' G Comp(I7), P' 
skeptically entails F. 

4- (Mixed open inference II) fi |=®'^ F iff for all consistent P' G Comp(f7), P' 
credulously entails F. 

Note that without the consistency requirement on P', mixed open inference 
would be trivial in most cases. It is often possible to build a pathological rule 
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p{a) <— ~^p{a) from the symbols in F and O, and obtain an inconsistent P' S 
Comp(l7). Then (without the consistency requirement) for all sentences F we 
would have 17 |='^® F and 17 F. 

The four forms of open entailment combine two aspects: 

~ The quantification on P' captures the kind of property to be verified. We 
may be interested either in proving that in some case something happens 
(e.g., if open predicates are completed in certain ways, then the program 
may do errors) or that in all cases some property is guaranteed (e.g., the 
program will always operate correctly, no matter how missing details are 
fixed) . 

~ Credulous and skeptical stable model semantics are the two basic semantics 
available at the underlying application level. 

Example 5. Consider the open program and the completions illustrated in Ex- 
ample 3. Since p{a,a) is true in the unique stable model of P2, then we have 
both 17 p{a, a) and 17 p{a, a). However, p{a, a) is not in the stable model 
of Pi, so 17 p{a,a). The sentence q{a) is skeptically entailed by Pi and P3, 

but not by P2 (P4 is ignored because it is inconsistent), so 17 q{a)- □ 

When fi is not intrinsically inconsistent, then the four kinds of entailment 
can be compared as stated by the next proposition. 

Proposition 6. Suppose there exists a consistent P' € Comp(17). Then, for all 
sentences F, 

1. 17 |=® P implies 17 F and 17 F; 

2. 17 F implies 17 \='^ F; 

3. fl F implies 17 \=^ F. 

Thus, we get a lattice of entailment relations, where skeptical open entailment 
is the strongest and credulous open entailment the weakest. 

There is also a duality between pairs of entailments, which is helpful, as the 
four inference problems can be reduced to only two problems. 

Proposition 7. For all open programs f2 and all sentences F, 

1. n^^^F iff n -nF; 

2. n F iff Q -^F. 

Therefore, in the following we shall focus on and They are based on 
skeptical inference, that — unlike credulous approaches — does not need the pro- 
gram to be instantiated before reasoning. Since, in general, the set of terms is 
not exactly specified, such instantiation may be expensive or even impossible 
(e.g., theoretically speaking, in the security policy verification problem the set 
of constants has no fixed a priori bound, as constants correspond to user names 
and data objects; in practice, their number is only bounded by operating system 
limitations, and is considerably high). Currently, it is not clear to what extent 
credulous approaches to open entailment are feasible. 
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4 Approaches to Skeptical Open Entailment 

The skeptical resolution calculus can be adapted to open entailment by a few 
modifications. Some of them essentially state that open literals (both positive 
and negative ones) should be treated like negative literals. 

Accordingly, an open program {P, F,0) is range restricted if each variable 
occurring within an open or negative literal in the body of a rule r G P, occurs 
either in the head of r or in a positive, non-open literal of r. 

Moreover, an open support for a ground atom A w.r.t. an open program 
Q = (P,F,0) and P' € Comp(f?), is a goal G obtained by unfolding A in 
Ground(P'), until all the literals in G are either open or negative. 

A ground open countersupport for A w.r.t. f2 and P' is a set of ground 
literals K such that 

1. each L £ K is the complement of some literal belonging to an open support 
of A w.r.t. f2 and P'; 

2. conversely, each open support of A w.r.t. fl and P' contains a literal whose 
complement is in K. 

In the open setting, the Failure rule should work no matter how the Herbrand 
domain can be extended. This requirement leads to the following definitions. 

Let a P' -substitution be a substitution whose range is contained in the lan- 
guage of P' . 

A (non-ground) open countersupport of an arbitrary atom A w.r.t. L? = 
{P,F,0) is a pair {K,9), where 0 is a P-substitution, such that for all P' G 
Comp(f7) and all grounding P'-substitutions a, Ka is a ground open counter- 
support for A0a w.r.t. 17 and P'. 

Example 8. Consider again the open program of Example 3. Atom p{a,b) has 
one open support, ^q{b), and a ground open countersupport q{b) (w.r.t. P 2 
and P 4 ). Moreover, p{Y, X) has a nonground countersupport ( {(/(A')}, {Y = a} ) 
(according to the intuition that p{a,X) is false whenever q{X) is true). □ 

Example 9. In [4], an identically empty (or inconsistent) policy template is pre- 
sented as an example of verification of partially specified policies. The corre- 
sponding open program (where irrelevant details have been simplified away) has 
the following structure: 

P={pi(A)^P 2 (A),-r(A), 

Pi(X)^P3(X),r(X), 

P 2 {X) ^ q{X),r{X), 
p^{X) ^ q{X),^r{X)} , 

P = an infinite set of identifiers, 

O = {q,r} . 

The atom Pi{X) has the following open countersupports, where e denotes the 
empty substitution: 

({^g(A)},e), (MX)},e), ({-r(A)},e) . 
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Note that countersupports may contain negative literals, because open atoms 
are treated like negative literals during open support computation. □ 

In the rest of the paper, we assume a function CounterSupp^ that maps 
each atom A onto a (possibly empty) set of (non-ground) open countersup- 
ports for A w.r.t. 17. By analogy with the original skeptical resolution calculus, 
CounterSuppj^ is an abstract model of the actual implementation of negation 
as failure (possibly including issues related to loop-checking, or tabulation and 
delay), largely independent of particular implementation choices (cf. [3]). 

Definition 10 (OSK-Derivations). An open skeptical derivation from f? = 
(P,F,0) ( OSK- derivation, for short) is a skeptical derivation from P where the 
Failure rule is based upon CounterSuppQ, and is never applied to any open atom. 

Example 11. Consider again the open program and the open countersupports 
illustrated in Example 9. The following is a formalized version of the proof that 
the partially specified policy is inconsistent. 



hPi{X) I ) 
{r{X) I ) 

{r{X) I r(X)) (-pi(X) I MX)) 
(□ I r(X)) Mr{X) I MX)) 
Mr[X) I MX)) 
{MX) I MX)) 
(□ I MX)) 



Failure, using ({r(Al)},e) 
Split 

Resol. with hyp. 

Success 

Failure, using ({^r(Al)},e) 
Resol. with hyp. 

Success 



The answer substitution is empty, which means that Q yX.^p{X) (cf. The- 
orem 13 below). 

Note the mix of negation as failure (Failure rules and countersupports) and 
“classical” reasoning by cases (Split rule), that considers the possible values that 
r{X) may take in different completions. 

Space limitations do not allow more complex examples. Interested readers can 
find a complex policy for a hospital in [4] , together with the translation into logic 
programs and a clear indication of what predicates are to be left open. Several 
policy verification proofs are included. They are all open skeptical entailment 
problems. □ 



The completeness of open skeptical derivations w.r.t. open entailment de- 
pends on the completeness of CounterSupp^. 

Definition 12. CounterSuppQ is complete (w.r.t. fl) if for all ground atoms 
Aj and all ground open countersupports K for Aj {w.r.t. 17 and some P' € 
Comp(l7)) there exist an open countersupport {K',6) G CounterSuppQ{A) and 
a substitution a such that A9a = Ay and K'a = K . 



The following theorem states that open skeptical resolution is sound and 
complete for open skeptical entailment. Note that the initial goal G is restricted 
to the language of P. The other goals cannot be inferred with open skeptical 
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inference, but resolution would not treat them properly. For example, from P = 
{p(X)} one could erroneously derive all goals p{a) such that a £ F. 

Theorem 13. Let G be a simple goal whose symbols oceur in P. If G has a 
successful OSK- derivation from fi with answer substitution 9, then fl VG'0. 
Conversely, if CounterSuppQ is complete and fi Ga for some grounding a, 
then G has a successful OSK-derivation from fl with answer substitution 9 more 
general than a. 

In general, implementing a complete function CounterSuppj^ is a nontrivial 
(and sometimes impossible) task, and an extensive investigation of this issue 
must be deferred to an extended version of the paper. However, we have identified 
two interesting special cases where completeness can be easily achieved: 

— If F" = 0 (i.e., the Herbrand domain is completely specified), then computing 
open countersupports is not harder than computing standard countersup- 
ports from a normal program. Open supports can be obtained by unfolding 
the given atom A in Ground(P) until all the literals are either open or nega- 
tive. Ground countersupports can then be obtained by collecting one literal 
from each support and negating it. This basic approach can be optimized 
in various ways, reducing redundancy, deriving nonground countersupports, 
etc. 

— Suppose {P,F,0) is range restricted and generic,^ that is, the terms oc- 
curring in P are all variables. (This is precisely the kind of programs we 
are using to verify the policy templates introduced in [4], cf. Example 9.) 
Then the countersupport construction illustrated in the previous point can 
be carried out from P rather than Ground(P), and yields a provably complete 
function GounterSuppj^. 

Example 14- As an example of nonempty open predicate specifications, consider 
an open program J7 modelling reachability in a directed graph: 



P = {e{X,v)^ 


- ~^e{v,X), 


(1) 


1{X,Y)^ 


-e{X,Y), 


(2) 


1{X,Y)^ 


-e{Y,X), 


(3) 


r{X,X), 




(4) 


r(X,F)e- 


-l{X,Z),r{Z,Y)}, 


(5) 



F = an infinite set of identifiers, not including v, 
0 = {e}. 



The graph’s edges are specified by the open predicate e. All we know about the 
graph is that it has a star-shaped subgraph with central node v, which is directly 
connected to each other node X either by an edge {v, X) or by {X, v). The other 
predicates model graph connectivity regardless of the edges’ direction. Predicate 



3 



We borrow this term from the theory of database queries. 
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1{X, Y) holds if there is an edge between X and Y , in some direction. Predicate r 
is the reflective and transitive closure of 1. We can prove that the graph is strongly 
connected (in all completions) by carrying out a successful OSK-derivation for 
r{X,Y) with empty answer substitution (which means 17 |=® yXXY.r{X,Y)). 
The derivation is the following: 









r{X,Y) 1 ) 






{1{X,Z) 


r{Z,Y) 1 ) 






{e{X,Z) 


r{Z,Y) 1 ) 


(e(X, Z),r{Z,Y) 


\e{X,Z)) 


ir{X,Y) 


|-e(A, Z)) 


{r{Z,Y) 


\e{X,Z)) 


ir{X,Y) 


|-e(A, Z)) 


(° 


\e{X,Y)) 


(r(A,F) 


|-e(A,F)) 






{r{X,Y) 


He(A,F)) 




{1{X,Z') 


r{Z',Y) 


|-e(A,r)) 




{e{Z',X) 


r{Z',Y) 


|-e(A,r)) 




-e(A, Z') 


r{Z',Y) 


|-e(A,r)) 






{r{Y,Y) 


|-e(A,F)) 






(a 


\MX,Y)) 



□ 



Resolution with (5) 
Resolution with (2) 
Split 

Resolution with hyp. 
Resolution with (4) 
{Z = Y) 

Success 

Resolution with (5) 
Resolution with (3) 
Resolution with (1) 
Resolution with hyp. 
{Z' = Y) 

Resolution with (4) 
Success 



□ 



5 Restricted Mixed Open Inference of Type I 

A general approach to mixed inference is still an open problem. In this paper 
we sketch a preliminary approach that applies to completely undefined open 
predicates and unbounded domains. More precisely, in the context of an open 
program {P,F,0), we assume that the predicates in O do not occur in the head 
of any rule of P, and F is infinite. 

The ground open resolution calculus should be extended with a new rule, 
called abduction rule: 

(Gi I Ri)...(G,-i I R,-i)(G',L,G" | Rj)(G,+i | R,+i)...(G„ | H„) 

(Gi I L,Ri)...(G,_i I L,R,-i)(G',G" | L.R,0(Gi+i I A Ri+i) ■ • ■ (G. | L,H^) 

where the predicate in L belongs to O, and under the restriction that each 
{L,Fli} must be consistent (1 < i < n). Intuitively, open predicates can be 
abduced as needed to complete the derivation. 

Simple mixed derivations extend OSK-derivations with zero or more instances 
of the abduction rule. 

Definition 15. Simple mixed derivations (SM- derivations, for short) are recur- 
sively defined as follows: 

— An OSK-derivation is an SM-derivation. 

— If Qq, . . ,,Qn is an SM-derivation and is an instance of the abduction 
rule, then Qq, . . . , Qn-\-i is an SM-derivation. 
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The following theorem states that SM-derivations are sound and complete 
under the restrictions stated at the beginning of this section, and the further 
assumption that P is call-consistent. 

Theorem 16. Let J7 = {P, F,0) , where P is call- consistent, F is infinite and 
the predicates in O do not occur in the head of any rule of P. Let G he any 
ground simple goal. Lf G has a successful, ground SM-derivation then f2 |='^® G. 
Conversely, if CounterSupp^ is complete and fl G, then G has a successful, 
ground SM-derivation. 

Example 17. Let L2 be defined as follows: 



P = {p{X)^^q{X),r{X), 
q{X) ^ ^p{X)} , 

F = {a,b} , 

0 = {r}. 



All the completions entailing -^r{a) entail also q{a). Accordingly, there exists the 
following ground SM-derivation: 



(9(a) I ) 
hp{a) I ) 

l^r{a) I ) 
(□ I ^r(a)) 



Resolution with q{X) ^ ^p{X) 
Failure with ({^r(a)},e) 
Abduction rule 
Success 



□ 



It should be possible to remove the restriction to ground derivations by keep- 
ing all the goals with hypotheses of the form (□ | H) in the derivation (e.g., by 
“turning off” the Success rule), and performing a final check that all such iJ can 
be instantiated to consistent sets of hypotheses using the symbols in F. 

Example 18. A nonground version of the derivation illustrated in the previous 
example, starting with {q{X) \ ), would terminate with the goal (□ | ^r{X)). 
Clearly, the hypothesis ~^r{X) can be consistently instantiated using the con- 
stants in F. □ 



Example 19. Let P = {p{a) <— q{X),^q{a)} and O = {q}. Consider the SM- 
derivation 



ip{a) I ) 

{q{X),^q{a) \ ) 
hq{a) I q{X)) 
(□ I q{X),^q{a)) 
□ 



Resolution with p{a) <— q{X), ^q{a) 
Abduction rule 
Abduction rule 
Success 



We have L2 p{a) iff F is not empty. Accordingly, the final hypotheses 
q{X),^q{a) can be instantiated to a consistent set iff F yf 0. □ 
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Similarly, it should be possible to remove the restriction to completely unde- 
fined open predicates by closing the hypotheses H under the partial definitions 
of open programs during the final check. 

The final check is in fact a particular constraint satisfaction problem. Detailed 
solutions to this problem are interesting subjects for further research. 



6 Final Discussion and Related Work 

The definitions of open programs and open entailment can be immediately ex- 
tended from logic programs to all nonmonotonic logics. On the contrary, the 
proof techniques based on skeptical resolution are tailored to logic programs. 
For a more general approach, other calculi (and extensions thereof) should be 
considered (e.g., [1,5,11]). 

At the current stage of investigation, we see no appealing way of approaching 
open entailment with credulous engines or calculi, because these techniques need 
to instantiate the theory. This is in contrast with the need of handling (possibly 
unbounded) open domains. On closed domains, we are planning an experimental 
comparison of credulous and skeptical approaches. The latter might be more 
efficient on open programs due to their goal-directed nature, that might focus 
proof efforts on relevant completions. 

The theoretical investigation of open programs and entailment is still in a 
very early stage, and many interesting questions are to be answered. More work 
is needed to obtain more general solutions to the entailment problems. Impor- 
tant (and partially related) issues such as the computational complexity of open 
entailment, expressiveness (i.e., which classes of properties can be checked via 
open entailment and skeptical resolution), syntactic restrictions on completions 
(e.g., restricting completions to stratified programs) have not yet been explored. 

Moreover, there is some interesting related literature whose relationships with 
our work have not yet been investigated. 

In [9], a semantic approach to reasoning with open domains was introduced. 
In the most optimistic perspective, open skeptical resolution might eventually 
be adapted to reason with the programs introduced in [9]. Another approach 
compatible with open domains can be found in [11]. Both works support first- 
order quantification. 

The original notion of open programs (e.g., [6]) adopted a fixed underlying 
universe and was introduced for characterizing a compositional semantics for 
logic programs. Some of those results could be of use for understanding inherent 
limitations of open entailment. 

Later on [7] the term “open logic program” has been used in a framework 
for integrating logic programming and classical first-order logic. There, open 
predicates are those defined by a classical first-order theory. The alphabet is 
fixed. 
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Abstract. We present a method to learn simultaneously definitions for 
a concept and its negation. This problem is relevant when we have to 
deal with a complex domain where it is difficult to acquire a complete 
theory and where we have to reason from incomplete knowledge. We use 
default logic to represent such incomplete theories. This paper specihes 
the problem of learning a default theory from a set of examples and a 
background knowledge. We propose an operational method to inductively 
construct such a theory. Our learning process relies on a generalization 
mechanism defined in the field of Inductive Logic Programming. We first 
consider the case where the initial knowledge is sure because it contains 
only ground facts. Then, we extend the framework to the case where the 
initial knowledge is a default theory. 



1 Introduction 

We present here a method that enables to construct a default theory from a set of 
positive and negative examples and an initial background knowledge. The learn- 
ing process that we propose is strongly related to research realized in the field of 
Inductive Logic Programming (ILP). ILP investigates theory and methods to in- 
duce first-order clausal theories from examples and background knowledge [18]. 
More precisely, in the normal framework of ILP, if B is the background knowl- 
edge and and E~ are the sets of positive and negative examples respectively, 
the aim is to induce hypotheses H such that B f\H \= E~^ and B AH A E~ ^ _L. 
In most cases, E^ and E~ are examples of a single target predicate and B and H 
are definite Horn clauses (but some systems [6] induce full first-order theories). 
The ILP community has also considered the problem of using more expressive 
formalisms, specially in systems that construct clauses containing the negation 
as failure operator [1,2,15]. 

The problem that we consider in this paper extends [9] and concerns the si- 
multaneous learning of definitions for a predicate p and its negation ~^p. So in our 
framework, negative examples for a predicate p will play the full role of leading 
to explicit definitions of ~^p. The relevance of this approach has been first pointed 
out by De Raedt [5] who argued that the closed world assumption is not suited 
to the learning paradigm because we cannot assume that everything is known. 
Our proposition follows the same idea and is concerned with the construction 
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of theories where it seems difficult to apply the closed world assumption. Let us 
imagine a secretary-agent that must learn from observations when it must pass 
on a phone call to the manager. This concept seems difficult to define completely. 
So it is a good representation for the agent to define explicitly situations where 
the call can be passed on, and situations where the manager must not be dis- 
turbed. This formalism enables the agent to recognize cases where the concept 
remains undefined according to the current learned theory. A situation may also 
be undetermined because it satisfies at the same time a positive and a negative 
definition. 

In order to give explicitly definitions of p and and to deal with possible 
inconsistencies between them, we propose to represent the learned knowledge by 
a default theory. Default logic [22] is a powerful language to represent incomplete 
knowledge, which enables our method to obtain compact theories where the 
relationships between the definitions for p and ^p appear clearly. In default 
logic, knowledge is represented by a default theory (IT, D), where IT is a set of 
classical formulas (the sure knowledge) and D is a set of default rules (or defaults) 
that represent non completely specified inference rules, often considered as rules 
with exceptions. Formally, a default ^^^^has a consequent 7 and two types of 
antecedents: a prerequisite a and a justification (3^ . Then, the intuitive meaning 
of a default rule is : “if a is proved, and if ->/3 is not deducible (in other words if 
(3 is coherent) then conclude 7”. In whole generality, a, (3 and 7 can be any first 
order logic formula. But in our work, they will be formulas with free variables, 
like p{X,Y), so our defaults are said to be open. As usual in default logic, each 
formula p{X, Y) represents the set of all ground formulas p{a, b) that can be 
obtained by instantiation with the constants of the domain. In this work, we 
only consider finite domains (without symbol function) and then our set of open 
defaults is in fact a compact representation of a finite set of closed defaults 
(without free variables) obtained by instanciation over the constant set. 

We recall below the definition of an extension that is a set of plausible con- 
clusions infered from a given closed default theory (see [22] for more details on 
default logic). A default theory is said to be closed if all its defaults are closed. 

Definition 1. [22] Let (IT, D) be a closed default theory. For any set of closed 
formulas S, let F(S) be the smallest set satisfying : 

- IT C F{S) 

- Th{F{S)) = F{S) 

- For any & D, if a G F{S) and -^(3 ^ S, then 7 G F{S). 

A set of closed formulas E is an extension of (IT, D) iff E = F{E). 

A fundamental feature of default logic is its ability to represent incom- 
plete knowledge, so it is not surprising that a default theory may have mul- 
tiple extensions : one for each point of view that we can adopt in front of the 
missing information. For instance (IT, D) = ({a}, ^7(7^}) has two exten- 

sions El = Th{W U {c}) and E2 = Th{W U {^6}). That is why it is necessary 

^ If 5 is a default rule, pre{5), jus(S) and cons (3) respectively denote the prerequisite, 
the justification and the consequent of 5. 
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to distinguish between skeptical (or cautious) theorems and credulous theorems. 
The former are formulas that occur in every extension (c V ~^b in our previous 
example) and can be considered as sure deductions. The later are formulas that 
occur in at least one extension (c in our previous example) and are only hypo- 
thetical conclusions. As it will be described later, this distinction is central in 
the paradigm that we present in our work. 

The rest of the paper is organised as follows : section 2 considers default 
learning in the case where the initial theory does not contain defaults. Our 
methodology is illustrated on examples in section 3. In section 4, we develop the 
more general framework of learning with an initial theory that contains defaults. 
Then, we compare our work with other approaches in section 5. 

2 Learning Defanlt Theories 

2.1 Definition and Algorithm 

The following definition formally precises the framework of learning a default 
theory; it is inspired by a well known semantic specification of ILP. In this 
section, we consider the special case where the initial background knowledge is 
expressed by ground facts. 

Definition 2. Let A+ = {p(ai), . . . ,p(a„)} be a set of positive examples and 
E~ = {^p(a[), . . . ,^p(a(„)} a set of negative examples of a target predicate p. 
Let W be an initial set of ground facts containing no occurrence of p or ^p, 
Learning a default theory for the concept described by p and ~^p consists 
in finding a default theory (W' , D') such that: 

- D' is a set of defaults, the consequents of which are p or ~^p 

- W' = W LI Ep, where Ep is a set of examples that cannot he generalized 

- (Aeg_E+e) A (Aegs-e)) is a skeptical theorem of (W',D'). 

Definition 3. An example e (p{a) or ~^p{a) ) is covered by a default theory 
{W,D) if e is a credulous theorem of{W,D). 

An example e (p{a) or ^p{a)) is an exception to a default theory (W,D)if 
->e is a credulous theorem of(W,D). 

Our approach considers that the training examples constitute a sure knowl- 
edge from which we induce default rules. As a default theory may have multiple 
extensions mutually inconsistent, our definition requires that the training exam- 
ples become skeptical theorems of the induced default theory. For instance, let 
E~^ = {flies{l)}, E~ = {^flies{2)} and W = {bird{l), bird{2), penguin{2)}, 

and let D' be the default set D' = \ b^rd(X) ■.f Ues{X) pengmn(X) ■.yflies(X) ^ 

1 fhes{X) ’ ^fhes{X) j 

theory {W,D') does not satisfy definition 2. In fact, (W,D') has two extensions 
El = Th{W U {flies{l), flies{2)}) and E 2 = Th{W U {flies{l),^flies{2)}) and 
consequently, -^flies{2) is not a skeptical theorem. The reader can easily check 
that if we take D' = | b^rd{x) fUes(x)A^pengu^n(x) ^ |, then 

{W,D') is a solution to this simple learning problem. 
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The main idea is to give symetric roles to positive and negative examples; the 
positive examples are used to build defaults defining p and the negative exam- 
ples are used to build defaults defining ~^p. Generalization of positive examples 
leads to a general rule defining p but this definition may admit exceptions, that 
are found by examining the negative examples. Generalization from such a set of 
exceptions enables to specialize the rule defining p and moreover gives a general 
definition of ~^p. A symetric treatment is also applied to generalization of nega- 
tive examples. To resume, our method to construct a set of defaults alternates 
generalization and specialization steps. 

The generalization process is based on a generic ILP algorithm named here 
Gen(( 7 , £+, T, ip) that, from a set of positive examples £'^ of the predicate q and 
a background theory T, induces one definition p{X) that characterizes a part 
of More precisely, that means that the theory T and the clause ((/(A) : 
—p{X)) enables to prove all the examples q{a) generalized by p (see section 3 
for details). 

Formally, the algorithms that we propose are the followings. 



Algorithm DefaultLearning 

In : p{X),W,E+,E~- Out : W' , D' 

Begin 

D' ^ 0 

E+ ^ E+ 

While E+ / 0 

Gen(p, W, <p) searches cp that generalizes a part of 
If a formula is found, then 

Add to D' the default 6 — 

p{^) 

Remove from the examples generalized by ip 
Exc •«— {e G E~ I e is an exception to(VK , D )} 

If Exc ^ 0 then Specia\ise{Exc, W, (5, W\ D\ {pre((5)}) 

else 

W' VJ E^ _E+ ^ 0 

Endwhile 

JUS{X) < — A->pre((5), for all 5 G D' s. t. cons{6) — p{X) 

E~ •<— {e G £'“|e is not covered by {W' , D')} 

While ^ / 0 

Gen(— ip, E~ , W, p) searches p that generalizes a part of E~ 

If a formula p is found, then 

Add to D' the default ^ ^p(X)a JCS(X) 

^p(X) 

Simplify JUS{X) — /\i=i~'Ji{X) by removing each Ji{X) s.t. there is no constant 
tuple X satisfying p{X) G E^ and W p{X) A Ji{X) 

Remove from E~ the examples generalized by p 

else 

W' ^ W' U E- E- ^ 0 

Endwhile 

End 

Algorithm Specialise 

In : Exc^W', InOut : 6,W',D'; In: ForbForm 

Begin 

Exc •< — Exc 

While ~ElZ / 0 

Gen(— icons(5), Exc^ W, 'ipExc) searches 'ipBxc that generalizes a part of Exc 
If '4’Exc is found and 'ipExc ^ ForbForm , then 
jus{5) ^ jus{S) A ^IpExc 
Add to D' the default Sexc — ‘ 

—'Cons{d) 

Remove from Exc the examples generalized by '4)Exc 

/* E stands for E”*" (resp. E~) if cons{5) — p{X) (resp. cons{5) — ^p(AT))*/ 

Excexc — {e G E| e is an exception to(W^ , D')} 
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If Excexc ^ 0 then Specialise(Z?a:cEa;c, Sexc, W' , D ' , F orb Form U {pre{SExc)}) 

else 

W' <— W' U Exc Exc ■<— 0 

Endwhile 



End 



In our main algorithm DefaultLearning, p stands for the predicate to learn, 
E~^ and E~ are the set of positive and negative examples of the concept; W is 
the initial theory; W is the theory W which may be augmented by some exam- 
ples that cannot be generalized; D' is a set of defaults the consequents of which 
are p and ~rp . For clarity, the algorithm is written by assuming that we begin 
by learning p. But as our method deals with positive and negative examples in a 
symetric manner, it could as well begin by learning ^p by exchanging the roles 
of E~^ and E~ . 

The process starts by a generalization step, that means that the learning al- 
gorithm Gen is applied to E^ in order to compute one formula ip that represents 
a subset of E'^. If it is possible to find such a formula, the default 
is build into D' . 

This default 6 may admit exceptions (see definition 3) . The set of exceptions is 
obtained by checking for each ^p(e) in E~ whether p(e) is a theorem of {W , D'). 
If the set of exceptions is not empty, we must specialize 5. Gen is used to induce 
a formula ip that generalizes these exceptions and we modify the default S into 
Lp{x) . p(xp\^ili{x) ^ gy way, this default is no longer applicable to the negative 
examples that verify At the same time, these negative examples, gener- 

alized by ■i/'(A), lead to a general definition of ^p, represented by the default 
■ This default is specialized on its turn if it is necessary. This recursive 
process always ends because we use the set of forbidden formula, F orb Form, that 
avoids possible loops in the situations where exceptions and examples are gener- 
alized by the same formula. For instance, with E^ = {flies{l), flies{2)}, E~ = 
{^flies{3),^flies{4)} and W = {bird{l), bird{2),bird{3),bird{4)}, we obtain 
the first default (5i = which has the exceptions E~ . If we spe- 

cialize (5i without taking into account ForbForm, we obtain the new default 

^ — bird{X):^flies{X) , c • in d' — bird(X) : fhes{X)A^bird(X) 

02 — -^fues(x) speciailzea m o^ — fues{x) ■ 

This is not acceptable because the positive examples flies{l), flies{2) are no 
longer covered by this theory, and become exceptions to 62, which leads to a 
loop in this recursive specialization. The use of ForbForm enables to find the 
final theory {W U {^//ies(3), ^//ies(4)}, | })^ because an exam- 

ple e that cannot be generalized is simply added to W as a ground fact. This 
ensures that e is a skeptical theorem of {W , D'). 

When all the positive examples are generalized (first Endwhile), we check 
whether there are still any negative examples not covered by the current theory. 
If it is the case, we begin to complete the definition of ~^p by a similar process. 
At this time, all the positive examples, that are the potential exceptions for 
defaults defining -rp, have already been treated. So the formulas to characterize 
these possible exceptions have already been computed: they are the prerequisites 
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of some defaults defining p. That is why all the new defaults that are introduced 
for ~^p are constrained by a justification JU S that is the conjunction of all the 
prerequisites of all defaults concluding p. By this way, we avoid the computation 
of exceptions, which is an expensive process. The counterpart of this strategy is 
that the justifications of these last defaults are certainly too complex and they 
are simplified by a mechanism that checks for each new default whether these 
formulas really correspond to some exceptions. 

A last point to notice is that in our algorithm the sets of examples that 
are not yet covered {E+ and E~) decrease each time that Gen generalizes a 
subset of examples: this is the principle of iterative covering common to many 
learning algorithms. But when we have to determine the exceptions to a current 
theory, we must take into account the initial sets of examples E~^ and E~ . This 
is necessary to be sure that we have found all the possible exceptions. 

2.2 Correctness 

The work presented here extends a previous method [9] that concerned only 
Lukaszewicz’ default theories where the existence of an extension is guaranted. 
In Reiter’s default logic, this point must be more carefully studied. 

Theorem 1. The algorithm DefaultLearning induces a default theory that has 
always an extension. 

Proof: In [14] it is shown that a Reiter’s default theory has at least one ex- 
tension if its block-graph contains only even cycles. For a default theory (VF, D), 
the block-graph is a pair {D,A). The vertex set D contains all closed defaults 
obtained from D except those that are incompatible with W, ie: defaults S s.t. 

W I gus{6). In our particular case, the arc set A contains the pair {6,6') if 6 

“blocks” 6', i.e.: W h pre(6) and W U cons{6) I gus{6'). By construction, each 

induced default is normal or semi-normal^ and its consequent is p{X) or ~^p{X). 
So, it is obvious that only even cycle may exist and then our learned default 
theories have always an extension. □ 

When it ends, our algorithm guarantees that all examples are covered (each 
given example e is a credulous theorem) and that there are no remaining excep- 
tions (for each given example e, ->e is not a credulous theorem). So we have to 
prove that it is sufficient to make all the examples skeptical theorems, as it is 
required by definition 2. 

Theorem 2. Let (W, D) be a default theory induced by the algorithm Default- 
Learning and e a given example. 

If e is a credulous theorem of {W, D) and -le is not a credulous theorem of 
{W,D), then e is a skeptical theorem of {W,D). 

Proof: Without loss of generality we fix that e is a positive example p{a) (the 
proof for a negative example is similar) such that p{a) is a credulous theorem 
and ^p{a) is not a credulous theorem. Since p{a) is a credulous theorem it means 

^ A default is semi-normal if it is like 
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that there exists a closed default S = (particularly we may have 

P{a)= true) with W h a{a). 

Let us suppose that p{a) is not a skeptical theorem. In other words, there 
exists an extension E not containing p(a), then 6 is blocked in E that is E \- 
^p(a) V -iP{a). Since ^p(a) is not a credulous theorem, it is never possible to 
obtain ^p(a), so the only way to block i5 is to derive ~'/3(a). ~'/3(a) cannot be 
obtained by a default S' because its consequent can only be p{X). So, we must 
have ~^P{a) G W. But in this case, S is always blocked and p(a) is not a credulous 
theorem. This contradiction gives our result. □ 

The next section gives examples illustrating our methodology. 

3 Commented Examples 

In order to test the relevance of our method, we have simulated its main steps 
on some artificial examples. It is fundamental in our work to compute general- 
ization formulas that may have exceptions. This can be realized in ILP systems 
(like FOIL [21] for instance) by allowing a certain level of noise. But it is difficult 
to adjust this parameter: if we accept a high level of noise, we find too general 
formulas, if the level of noise is too weak, the generalization is too specific or 
impossible because of the exceptions. To avoid this difficulty that must be stud- 
ied carefully for each application domain, we use a generalization mechanism 
that rely only on positive examples. The ILP system Progol [16] has the ability 
to learn from positive data only [17] and we use it as the generalization tool 
described by the function Gen in our algorithm. Progol is an ILP system based 
on inverse entailment. The input file for Progol specifies the set of positive ex- 
amples and the initial background theory that may contain definite Horn clauses 
but also integrity constraints expressed by headless Horn clauses. Moreover, the 
user specifies type and mode declarations for the predicates. These biases are 
very important to determine the space of possible generalizations that Progol 
searches with an A*-like algorithm in order to return a clause that realizes the 
best data compression. 

In the following examples, the different stages of our method have been simu- 
lated by switching learning steps of p and ~^p. When learning p, the theory with 
only E~^ was considered and in order to learn ~^p, the negative examples are 
considered with ~^p renamed in an ad-hoc predicate notjp. The covering tests, 
that are necessary to determine which examples are not yet generalized and also 
to determine exceptions to a default, require either extension calculus or query 
answering in Reiter’s default logic. For both tasks, operational systems exist (for 
instance DeRes [4], GADEL [19], XRay [20]), and they could be integrated in a 
whole system for default theory learning. 
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Example 1. The initial theory W concerns a set of people and a set of dishes^. 

’ hb{l), Ii6(45), /i6(46), hb{50), hb{51), hb{55) ' 
n(46), n(50), diab(51), diab(55) 

_ a{mutton),a{beef),a{fish), 
di{mutton),di{beef),di{fish), 
oa(egg),oa{milk), di(egg) , di(milk) , 
sug{ice-cr earn), sug (cake), di{ice-cr earn), di{cake) 

The aim is to induce what people eat and what they do not eat from the 
following sets of examples. 

{ eats{2,egg),...,eats(50,egg), 'j 

eats(l,milk), eats(50, milk), eats{l, mutton), ..., eats(45, mutton), > 
eatsil, beef), ..., eats(45, beef),eats(l, fish), ..., eats(45, fish) J 

' ^eats{l,egg), 

-'eats(46, mutton), ..., -ieats(50, mutton), 

_ ^eats{AQ, beef), ...,^eats{bO, beef), 

-ieats(46, fish), ..., -'eats(50, fish), 

^eats{51, icejcream), ..., ^eats{55, icecream), 

-'eats(51, cake), ..., ^eats{55, cake) 



Let us suppose that we begin to learn the definition of eats{X, Y). So we run 
Progol in order to generalize from the examples E~^ and W. The best clause ac- 
cording to Progol evaluation is {eats{X,Y) hb{X),oa{Y)), which means that 
all the persons eat dishes that have an animal origin (eggs and milk) . From this 



formula we build into D' a first default (5i = fMts{x,Y) ^ exam- 

ining the set of negative examples, we find that this default admits only one 
exception ~^eats{l,egg), that cannot lead to a relevant generalization. So this 
exception ~^eats{l,egg) is added to W . There are still some positive examples 
that are not covered by (VF', D') and a second call to the generalization of 
Progol returns the clause (eats{X,Y) hb{X),a{Y)). So we build the default 
S 2 = default admits a set of exceptions Excs^ = 

{-■eats(46, mutton), ..., -ieats(50, /isft.)}. In order to characterize these excep- 
tions by a general formula, we submit this subset of examples to Progol (after 
a replacement of ^eats by not_eats. Progol returns the clause {not_eats{X , Y) 
:- v{X) A a(y)). So ,52 is specialized into 5'^ = '»ttx)Aa(y) : eatgAOA^tt(x)Aa(y)) 



and at the same time, we build ^3 = ^ xhis default 5^, admits 

no exception. 

At this moment, we have finished the covering of all the positive examples and 
we consider the negative examples that are not yet covered by {W , {5i,8'2, ^a}), 
namely {-,eats(51, icejzream), . . . , -,eatt(55, cake)}. To generalize these instan- 
ces, Progol finds the formula {diab{X) A sug{Y)) and we build a default ^4 with 
this formula as prerequisite. To take into account the whole job that has been re- 
alised during the learning of the positive part eats{X, Y), this default has a justi- 



® The following notations are used: hb stands for human_being, v for vegetarian, di for 
dish and diab for diabetic; a for animal qualifies dishes that are animal flesh, and 
oa for animaLorigin qualifies dishes that have an animal origin, sug qualifies sugary 
food. 
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fication which is the conjunct of the prerequisites of defaults defining eats{X, Y): 
Si = d^ab{X)Asug{Y) : ^eats(X,yjA^y^X)Aoa(F))A^(/»fc(^)Aa(y)) ^ justification in 

64 is simplified by checking that there does not exist a couple {X,Y) such that 
eats{X,Y) G and {diab{X) A sug{Y)) and (hb{X) A oa{Y)) are true simul- 
taneously. So (hb{X) A oa{Y)) is removed from the justification of Sa- The same 
is true for {hb{X) A a{Y)) and finally, we obtain 5'^ = -^ats{x,Y) ^ 

The simplification process relies on theorem proving in Horn logic and is much 
less expensive than the computation of exceptions that requires theorem proving 
in default logic. One can easily check that all the positive and all the negative 
examples are skeptical theorems of {W , {(5i, <^ 27 1^37 <^ 4 })- 

The following example illustrates that a learned default theory may have 
multiple extensions. 

Example 2. Let us consider that we want to learn the predicate with W = 
{q{bl), q{b2), q{b3), q{nixon),r{tl), r{t2), r{nixon)}, = {p(61), p{b2), p{b3)} 
and E~ = {^p[tl), ~^p{t2)}. 

Let us note that nixon is not given as a positive example nor as a negative 
one. So the simplification step applies to the second default, inducing D' = 
i ^ required by our definition, the conjunct of 

all the examples is a skeptical theorem of (IT, D') even if this theory has two 
distinct extensions Ei = Th{WVb {p(&l),p(62),p(fo3), ^p(tl), ^p(t2),p(nfa;on)}) 
and E 2 = Th{W U {p{bl) , p{b2) , p{b3) , ^p{tl) , ^p{t2) , ^p{nixon)}) . Knowledge 
about nixon remains undefined since it is not a training example. 

4 Learning with Initial Defanlts 

In both previous sections we consider special default theories where W only con- 
tains ground facts. This requirement was necessary to make a bridge between 
default logic where the sure knowledge can be expressed by any first order for- 
mula and ILP where the initial background knowledge is expressed by Prolog 
clauses, that are not equivalent to implications. In the case of the example 1, the 
whole initial theory is expressed by ground facts, whereas some general Prolog 
clause like (hb{X) v{X)) could have been used. Let us notice that such an 
oriented rule could be written in default logic by ■ 

We consider now that we want to learn a new concept from an initial de- 
fault theory and a set of examples. The initial default theory may have multiple 
extensions, but this difficulty can be resolved if the learning process relies only 
on the sure initial knowledge. That is why we propose a method where gener- 
alization uses a background knowledge including only all the ground facts that 
are skeptical theorems of our initial theory. This new learning problem can be 
stated as followed. 

Definition 4. Let E~^ and E~ be positive and negative examples of a target 
predicate p. Let {Wq,Dq) be an initial default theory containing no occurences 
of p or ^p. 

r stands for republican, q for quaker and p for pacifist. 
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Learning a default theory for the concept described by p and ^p consists 
to build a default theory (W',D') such that: 

- D' = Dq U Dp, where Dp is a set of defaults, the consequents of which are p 
or ^p 

- W = Wo U Ep, where Ep is a set of examples that cannot be generalized 

- (Aeg^+e) A (Aeg£;-e)) is a skeptical theorem of {W',D'). 

We propose the following method to induce W and D' in such a case. First, 
we compute W the set of ground facts that are skeptical theorems from the 
initial default theory {Wo, Do). Then the algorithm to learn the default theory 
{W , D') is the same as the one given in subsection 2.1, except the two following 
modifications: 

- DefaultLearning works on the inputs: p{X), W, Wo, Do, E^ , E~ 

- the two first initializations W' <— W and U' <— 0 are replaced by 

W' ^ Wo D' ^ Do 

So the background knowledge used for generalization by Gen is always W, 
the set of skeptical theorems of {Wo, Do). But each time we have to compute a 
set of exceptions, we consider the exceptions of the current theory {W , D'). This 
current theory contains the initial default theory (Wo,I?o) augmented by some 
new defaults defining p or ^p and eventually by some examples that cannot be 
generalized. So, generalization relies on sure knowledge but the search of excep- 
tions takes into account the credulous theorems of {Wo, Do). This is necessary 
to insure that each example will be a skeptical theorem of the resulting default 
theory {W',D'). 

Example 3. Let us consider the initial theory {Wo, Do) with Wo = q{b2), 

q{nixon), r{tl), r{t2), r{nixon), usp{nixon), p{john)} and Do = { ; 

} ■ Let E~^ = {no{bl), no{b2), no{john)}^ and E~ = {^no{tl), 
~^no{t2), ^no{nixon)'\ 

The initial theory {Wo, Do) has two extensions and we consider only the set W 
of ground facts that are skeptical theorems in order to learn a definition of no. 
As W = Wo U {p{bl),p{b2),^p{tl),^p{t2)}, our method constructs the first de- 
fault (5i = . The negative example ^no{nixon) is an exception for 

(5i since there exists an extension where i5i can be applied to nixon. A gener- 
alization of this exception leads to the formula usp{X). Then (5i is specialized 
into 5 \ = p(^) ■ no(x)A^usp(X) ^ same time we build the default S 2 = 

exceptions. The learning process completes the defi- 
nition of ^no by the default S 3 = check that each example 

is a skeptical theorem of {W',D') with W' = Wo and D' = Do U {5^, (^ 2 , 

This final resulting default theory (W', D') has two extensions because of the 
remained incomplete specification about p{nixon) and ^p{nixon). But, each of 
these extensions contains the conclusion ^no{nixon) as it is required by our 
objective. 

® no stands for nuclear_opponent and usp stands for US President. 
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5 Related Works 

The problem of learning non-monotonic theories by learning both a concept 
and its negation has been pointed as very interesting for many years. In [-5] a 
concept and its negation are effectively learned, but in the framework of definite 
clauses : the negative concept is represented by a new predicate not-p and the 
learning algorithm checks that no contradiction occurs between the definitions 
of p and notjp. The framework proposed in [8] learns a concept and its exceptions 
by means of general rules, and conflicts between rules are solved by additional 
priority relations. This framework captures the notion of specificity of a rule as 
it is done in [3] in prioritized default logic. But, it is known [23] that specificity 
can be handled by means of semi-normal defaults and that is exactly what our 
method does. In [7] the problem of contradiction between definition of p and 
~^p is solved by using integrity constraints in order to restrict the conclusions 
derivable from too general rules. 

More recently, some works deal with this problem in the context of extended 
logic programs [11,13,12]. Extended Logic Programs (ELP) have been intro- 
duced by Gelfond and Lifschitz [10] to extend the class of normal logic pro- 
grams by allowing explicit negation. A rule in an ELP has the form Lq <— 
Li, . . . , Lm, not Lm+i, ■ ■ ■ j not where each Li is a literal (positive or neg- 
ative). [11,13] propose methods to learn an ELP that contains a definition 
of p and a definition of ^p. Each definition may have exceptions that are de- 
scribed by abnormality predicates, and these abnormality predicates are de- 
fined by normal clauses. So, the aim of these works is the same as ours. The 
main difference is that we do not rely on abnormality predicates to special- 
ize overgeneral rules. For instance, the algorithm presented in [13] learns rules 
for p and specializes them if they have exceptions, then it computes on the 
same manner a set of rules for ~^p. For our example 3, the following rules 
p{X) : —q{X),not -^p(X). abl(X) :- usp(X). ^p(X) : —r{X),notp{X). 

no{X) : —p{X),not ahl{X). -nno{X) : —usp{X). 

are learned. We can observe that the algorithm has dealt twice with the set of 
US presidents, once when US presidents are considered as a characterization of 
abl and another time when US presidents are considered as examples of the con- 
cept ~^no. This illustrates that using abnormality predicates to specialize rules 
hides the deep relationships that exist between definitions of no and ^no and 
this leads to redundancy in the learning process and in the resulting rules. Fur- 
thermore, the complete theory induced by [13] would in fact transform the rule 
defining no into the two rules : (no(A) : —p{X),not ahl{X),not ^no{X).) and 
(no{X) : —p{X), undefined{-^no{X)).) and similarly for the other rules conclud- 
ing ~^no{X). The well founded semantics requires these modifications in order 
to deal correctly with the examples where the definitions of no and ^no overlap. 

The method we have presented, like those described in [11,13,12], is a method 
to build a consistent theory in a non-monotonic framework. The common fea- 
ture of our work and those presented in [11,13,12] is to rely on a standard ILP 
procedure to compute definitions for the positive and the negative parts of the 
concept; then we use a theorem prover for default logic (a theorem prover for the 
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answer set semantics in [11] and for WFSX semantics in [13,12]) to compute the 
potential exceptions to the definitions that have been induced. So, the central 
point is to study for each used semantics how to agreggate these definitions into 
non monotonic rules able to deal with potential contradictions. 

A more recent work [25,24] presents another approach where the induction 
of hypotheses is realized directly from the answer sets of the initial program. 
So this work redefines the learning process accordingly to the framework used. 
In the case of a background program having multiple answer sets, the author 
proposes to learn different rules for each answer set, which is very different from 
our proposition of part 4. Of course, further study and also experimentation 
of those formalisms on real problems are necessay to decide whether induction 
must rely on credulous or skeptical knowledge. 

6 Conclusion 

We have presented a framework to induce default theories from training exam- 
ples. Default logic is probably the most general framework that we can imagine 
to represent at the same time a concept and its negation and the recent tools 
realized for extension calculus or query answering enable to consider its applica- 
tion to some real domains. This paper has shown how to control the inductive 
construction of a default theory to insure that it correctly represents the knowl- 
edge contained in the training examples. The availability of ILP systems allowed 
us to check the relevance of this approach on some artificial examples. We have 
now to further study the generalization process, specially on real examples. We 
think that our method can be the basis of a system that helps a user to formalize 
its knowledge in default logic. 
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Abstract. This work concerns the use of default knowledge in concept 
learning from positive and negative examples. Two connectives are added 
to a description logics, C-CLASSIC, previously defined for concept learn- 
ing. The new connectives (S and e) allow to express the idea that some 
properties of a given concept definition are default properties, and that 
some properties that should belong to the concept definition actually 
do not (these are excepted properties). When performing concept learn- 
ing both hypotheses and examples are expressed in this new description 
logics but prior to learning, a saturation process using default and non 
default rules has to be applied to the examples in order to add default and 
excepted properties to their definition. As in the original C-CLASSIC, 
disjunctive learning is performed using a standard greedy set covering 
algorithm whose generalization operator is the Least Common Subsumer 
operator of C-CLASSIC^e. We exemplify concept learning using default 
knowledge in this framework and show that explicitly expressing default 
knowledge may result in simpler concept definitions. 



1 Introduction 

The general aim of concept learning consists of inducing hypotheses from a set of 
examples of an unknown target concept. The choice of the concept (and there- 
fore hypothesis) and example languages is very important in this framework. 
Inductive Logic Programming (ILP, [15,17]) studies learning within the frame- 
work provided by clausal logic. However, the language is often restricted to Horn 
clauses for complexity reasons. Description Logics (DLs) are other restrictions of 
first-order logic^ in which the subsumption computation and its complexity have 

^ Note that comparing the expressive power of DLs and restrictions of First Order 
Logic used in ILP is still an open issue [7]. 
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been deeply studied [11]. Several ILP approaches presented a learning framework 
where the learned theories or the entailment relation are non-monotonic (e.g. 
[1,9]) in order to put emphasis on the problems that cannot be captured by clas- 
sical definite logic programs. We propose here to learn in a framework combining 
rules, default rules and a description logics allowing to handle default knowledge. 
In our approach, concept learning is performed in two steps. First, we use both 
default rules and strict rules in order to extend the definitions of the examples 
by adding default properties and excepted properties. Then, concept learning 
is performed using subsumption and Least Common Subsumer algorithms that 
handle examples and hypothesis including such default and excepted properties. 

The description logics used here, C-CLASSIC^e [23,22], extends C-CLASSIC 
with two non classical connectives (<5,e) used to represent default knowledge. C- 
CLASSIC is one of the most expressive previously known tractable DL, which 
preserves its good computational properties both for subsumption and PAC- 
learnability. The connective S intuitively represents the common notion of de- 
fault. For instance, having SViviparous as a conjunct in the definition of the 
concept Mammal states that mammals are generally viviparous. The connec- 
tive e is used to represent a property that is not present in the description of 
the concept or of the instance but that should be. Thus, for instance, being a 
mammal, an ornithorynchus^ should be viviparous since mammals are generally 
viviparous. However, an ornithorynchus is “exceptional” w.r.t. this property (i.e. 
it has the property Viviparous'^) as it is an oviparous mammal. 

Default and excepted properties can be deduced by applying default rules closely 
related to Reiter’s normal defaults. For instance, the Reiter’s normal default 
Bird^xym^y(x) interpreted as ”if a; is a bird and if it is consistent that x can fly 
then infer that x can fly”. Note that this default rule handles in the same way 
particular birds that cannot fly, as penguins, and non flying animals as cats for 
instance: nothing is deduced. In our framework, the rule corresponding to the 
previous normal default is Bird{x) -^d Fly{x) which is interpreted as ”if a; is a 
bird and if it is consistent (i.e. not incoherent) that x can fly then infer that x 
generally flies ((5Fly(x) is inferred) else infer that x is exceptional w.r.t. Fly 
(Fly'^{x) is inferred). In this framework, we infer that a bird generally flies, that 
penguins are exceptional w.r.t. Fly and nothing is deduced concerning cats. 
Such default rules together with strict rules (i.e. non default rules) allowing 
to express incoherences (e.g. Fly FI Inapt-to-fly T^) are used to extend the 
description of the examples prior to learning. 

As in the original C-CLASSIC, disjunctive learning is performed using a stan- 
dard greedy set covering algorithm whose generalization operator is the Least 
Common Subsumer operator of C-CLASSIC^e. The computation of the sub- 
sumption relation of C-CLASSIC^e, that is used to check whether a hypothesis 
covers an example, has been proved to be correct, complete and polynomial. 
Furthermore, C-CLASSIC^e is PAC-learnable [23,21,25]. 

^ Ornithoryncus = duck-billed platypus. 

® T is used to denote incoherences 
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This paper is organized as follows. Section 2 gives some needed background 
information on the C-CLASSIC^e description logics. Learning and saturation 
process in C-CLASSIC^e are described in sections 3 and 4 together with a com- 
parison with learning in C-CLASSIC. Finally, in section 5 we briefly discuss 
related work in DLs and ILP fields and we present future work. 



2 C-CLASSICa. 

Description Logics are a family of knowledge representation formalisms which 
stem from KL-ONE [3]. Several systems have been built based on DLs (e.g. 
CLASSIC [2], FLEX [18]) and they have been used in real-world applications 
(e.g. CLASSIC in [20]). Besides, DLs facilitate the use of background knowledge 
and are more expressive than attribute- value representations. The field of DLs 
has received increased attention over the recent years in the Machine Learning 
community (e.g. [14,7,6]). These previous approaches used terminological lan- 
guages unable to define concepts with default properties whereas allowing for 
default properties in concept definitions is frequently required in applications 
where few concepts can be strictly defined with necessary and sufficient proper- 
ties [12]. 

In DLs, a concept is defined as a set of properties satisfied by individuals that 
are instances of the concept. These properties are expressed by terms that are 
built from atomic concepts and roles and from a set of connectives. Concepts 
are partially ordered by a subsumption relation which expresses the inclusion 
relation between concepts and is usually based on a standard model-based log- 
ical semantics. The subsumption relation in C-CLASSIC^e which is central for 
the learning task is presented in section 2.2. Knowledge is mainly separated into 
two components: a terminological component (T-box) which contains the defi- 
nition of concepts and an assertional component (A-box) containing statements 
about individuals. We assume here that the A-box is empty since we represent 
the examples using the terminological language presented in section 2.1 (see sec- 
tion 3 for more details about examples). Section 2.3 presents the Least Common 
Subsumer operation in C-CLASSIC^e which is the generalization operator used 
during the learning process. 

2.1 Terminological Language 

The connectives of C-CLASSIC^e are the connectives of C-CLASSIC [7] plus 
the connectives i5 and e introduced in ACsf^ [8] . The terminological language of 
C-CLASSIC^e is defined using a set R of atomic roles, a set P of atomic concepts, 
the constants T and T, a set I of individuals (called classic-individuals), and the 
following syntactic rule (C and D are concepts, P is a atomic concept, i? is a 
atomic role, rt is a real, n is an integer and A are classic-individuals): 



T 


the most general concept 


1 -L 


the most specific concept 


1 P 


atomic concept 
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ONE-OF {/i. . . 

MINu 

MAXu 

Cn D 

Vfl : C 

R FILLS {/i. . . 
R AT-LEAST n 
R AT-MOST n 
SC 
C‘ 



concept in extension 
u is a real number 
u is a real number 
concept conjunction 
value restriction 
subset of values for R 
cardinality for R (minimum) 
cardinality for R (maximum) 
default concept 
exception to the concept C 



This syntactic rule is used to define terms of C-CLASSIC^e. 

Example: Student r\ S {publications AT-LEAST 4) □ y age:MAX 27 r\ publications 
FILLS {JAIR,AI} n y publications: {y-ycar: ONE-OF{97,98,99}) describes all 
the students who generally have at least four publications, are less than 27 years 
old, have at least one publication in JAIR and AI, and whose publications have 
been published in the years 97, 98 or 99. 

Defining a concept^ means giving a name A to a term T of the C-CLASSICae 
language using the expression A = C. 

Example: Mammal = Animal n SViviparous n Vertebrate 
A T-box of C-CLASSIC^e is composed of concept definitions. 

2.2 Subsumption in C-CLASSIC^g 

In DLs, concepts are organized in a taxonomy via a subsumption relation. Con- 
cerning the strict part of C-CLASSIC^e, subsumption in C-CLASSICae is equiv- 
alent to subsumption in C-CLASSIC. More precisely, a concept C is subsumed 
by a concept D if C has (explicitly or implicitly) all properties of D. In our 
framework, we must distinguish strict and default properties. Roughly speaking, 
a concept C is subsumed by a default property if its definition contains either 
the default property, the strict property or the excepted property. For instance, 
SFly subsumes concepts having explicitly or implicitly either SFly, Fly or Fly^ 
in their definition, while concepts whose definition does not mention anything 
(strict, default, exception, exception of exception . . .) about Viviparous are not 
subsumed by SViviparous. 

Example : 

Bird = Animal n Has-Wings n SFlies (a bird generally flies) 

Penguin = Animal □ Has-Wings □ 5{Flies^) (a penguin is generally exceptional 
w.r.t. Flies) □ 5 Inapt-to-fly (a penguin is generally inapt to fly) 

SuperPenguin = Animal n Has-Wings n {Flies^Y Superpenguin is an ex- 
ception to an exception since it is an exceptional Penguin) n Inapt-to-fly’^ (a 
SuperPenguin is exceptional w.r.t. Inapt-to-Fly since it can fly) 

With these definitions. Bird subsumes Penguin and SuperPenguin {SFlies 
both subsumes S{Flies’) and {Flies’)’). SuperPenguin is subsumed by 
Penguin {S{Flies’) subsumes {Flies’)’ and S Inapt-to-fly subsumes Inapt-to- 
fly’). Note that if Bird and SuperPenguin were defined with the strict prop- 
erty Fly and Penguin with the strict property Inapt-to-fly, Penguin would no 

Note that cyclic concept definitions are not allowed. 
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more be subsumed by Bird and SuperPenguin would no more be subsumed by 
Penguin. 

More formally, let C and D be two elements of C-CLASSIC^e, CUD, i.e. D 
subsumes C, iff C satisfies the strict properties of D, and satisfies or is explicitly 
“exceptional” w.r.t. the default properties of D. 

The definition of the subsumption of C-CLASSICae is based on an “equational 
system” fully defined in [22] called EQ. EQ is a set of axioms defining the main 
properties of the C-CLASSIC^e connectives (e.g. the axiom A\1 B = B \1 A 
expresses the commutativity of concept conjunction, the axioms A □ 5 A = A 
and A^ □ 6 A = A^ express a subsumption relationship between A and SA {A is 
subsumed by 5 A) and between A'^ and 5 A {A^ is subsumed by 5 A), the axiom 
SSA = 6 A expresses the idempotence of i5). 

Let =eq denote the equality (modulo EQ axioms) between two terms of C- 
CLASSIC^e. Subsumption in C-CLASSICie is defined as follows. 

Definition 1 (Subsumption) Let C and D be two elements of C-CLASSICse, 
C Q D, i.e. D subsumes C , iff CU D =eq C. 

In DLs, the subsumption computation (for instance C U £)) is performed in two 
steps. First C and D are expanded (i.e. their definition is then exclusively made 
up of atomic concepts and roles). This expansion step allows us to take into 
account the background knowledge linked to the T-box. Then, a subsumption 
algorithm is applied on them. 

In [23,22] , a polynomial-time, complete and correct subsumption algorithm based 
on the equational system has been designed for C-CLASSIC and C-CLASSICse. 
This algorithm computes normal form of concepts according to the equational 
system, this normalisation step will be used during the saturation process. This 
subsumption is not a pure syntactic relation like ^subsumption. It is a seman- 
tic relation like logical implication or generalized subsumption [4]. Indeed, the 
subsumption takes into account the whole T-box which expresses a kind of back- 
ground knowledge. In other words, the subsumption relation corresponds to log- 
ical implication within C-CLASSIC^e. 

2.3 Least Common Subsumer in C-CLASSIC^g 

As mentioned above, learning in C-CLASSIC^e relies both on the subsumption 
relation and on the computation of the Least Common Subsumer (LCS)^ of two 
concept definitions. The definition of the LCS in C-CLASSIC^e is as follows: 

Definition 2 (LCS in C-CLASSIC^,) LCS(A,B) ^ C e C-CLASSIGse if 
and only if A C and B C (C subsumes both A and B), /3 D, D £ C- 
CLASSICse such that A D, B D and D is strictly subsumed by C . 

An LCS algorithm has been designed for C-CLASSIC^e in [23,24]. It has been 
proved that this algorithm is correct and polynomial, and that the LCS is unique. 

® In the framework of DLs, the notion of Least Common Subsumer has been introduced 
by Borgida, Cohen and Hirsh in [5]. 
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Example: 

C = Animal n Vertebrate n With-beak n Oviparous n Has-teats n Viviparous'^ 
n \/weight:MIN 20 □ Wage:MAX 15 

D = Animal n Vertebrate n Has-teats n Viviparous n yweight:MIN 10 n 
Mage-MAX 10 

LCS(C,D) = Animal n Vertebrate n Has-teats n SViviparous n WuieightMIN 
10 n Mage-MAX 15 

3 Learning in C-CLASSIC 

Cohen and Hirsh [7,6] give theoretical and experimental results on the learn- 
ability of description logics. In particular, they prove that C-CLASSIC is PAC- 
learnable. 

From a practical point of view, the authors propose several algorithms allowing 
to learn concepts of C-CLASSIC from positive and negative examples of these 
concepts. The language of both concepts and examples is the terminological lan- 
guage of the DL. 

The covers relation that specifies how hypotheses relate to examples is the sub- 
sumption relation: an hypothesis H covers an exemple e if and only ii e V H 
(i.e. H subsumes e). The aim is then to find a hypothesis H that covers all pos- 
itive examples (completeness) and none of the negative examples (consistency) . 
As C-CLASSIC only contains a limited kind of disjunction (the ONE-OF con- 
nective) , many target concepts of practical interest cannot be expressed using a 
single term of C-CLASSIC. One way to overcome this limitation is to consider 
algorithms which learn a disjunction of terms rather than a single term, i.e. a 
hypothesis H such that iL = Hi V H 2 . . . V H„ (however, note that the con- 
nectives i5 et e allow to limit the number of disjuncts used to represent concepts 
(see section 4.3)). The cover of an example is then as follow : 

If iL = Hi V H 2 . . . V H„ and e is a concept, then H covers e if and only if 3 
Hi, e C Hi (i.e. H^ subsumes e) 

The basic idea behind the LCSLEARNDISJ algorithm described in [7] is to use 
the LCS to implement a specific-to-general greedy search for hypotheses that 
cover many positive examples and no negative examples (this approach is simi- 
lar to GOLEM [16] where multi-clause Prolog predicates are learned). 

Example 1 

Let E+={ei, e 2 , ea, ea} be a set of positive examples of the concept to learn and 
E“={cei} a set of negative examples of this concept. 
ei = Animal □ Viviparous 13 Vertebrate 3 Barks. 

B 2 = Animal 3 Vertebrate 3 Oviparous 3 Has-teats. 
ea = Animal 3 Vertebrate 3 Flies 3 Quacks. 

ei = Animal 3 Vertebrate 3 Lives-in-Antartica 3 Has-Wings 3 Inapt-to-fly. 
ce’i = Animal 3 Vertebrate 3 Lives-in-the-sea 3 Scales. 

Results : 

LCSLEARNDISJ computes the Least Common Subsumer of various subsets of 
positive examples. Since all the computed LCS (e.g. Animal 3 Vertebrate) cover 
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the negative example, no consistent generalization can be performed. As a conse- 
quence, LCSLEARNDISJ returns the disjunction of the description of the four 
positive examples: {Animal □ Viviparous □ Vertebrate □ Barks) V {Animal □ 
Vertebrate □ Oviparous □ Has-teats) V {Animal □ Vertebrate □ Flies □ Quacks) 
V {Animal □ Vertebrate □ Lives-in-Antartica □ Has-Wings □ Inapt-to-fiy) . 

In this example, we can see that for instance ei and e2 have more in common than 
Animal □ Vertebrate since ei is viviparous and 02 has teats. However, the rela- 
tionship between Viviparous and Has-teats can not be expressed in C-CLASSIC 
since it is not a strict knowledge (i.e. it is neither true that all animals that have 
teats are viviparous nor that all animals being viviparous have teats) and we 
can not add Viviparous to 02 since it is oviparous (the same problem appears 
with es that flies and which has wings but which is inapt to fly). In other 
words, 02 and 64 have exceptional properties but C-CLASSIC does not allow to 
express these exceptional properties. We show in section 4 how the saturation 
process allow to learn a more suited concept in C-CLASSICae without explictly 
expressing the exceptional properties of 02 and 04. 

4 Learning in C-CLASSIC^g 

Learning in C-CLASSIC^e is similar to learning in C-CLASSIC. C-CLASSIC^e 
has been proved PAC-learnable [23,21]. The same LCSLEARNDISJ algorithm 
can be used as polynomial subsumption and Least Common Subsumer algo- 
rithms have been defined on C-CLASSIC^e. 

However, the example definitions have to be saturated prior to learning using 
background knowledge. A part of this background knowledge is related to default 
and excepted properties and it is used to add such properties in the positive and 
negative examples. The learning problem for our framework is therefore defined 
as follows: 

Given: a set of T-box statements, a finite set of rules® [3 (background knowl- 
edge), and sets of C-CLASSIC^g concepts E+, E“ (positive and negative exam- 
ples) . 

Build: sets of saturated examples E+’ and E“’ (E+’ and E“’ are the result of 
the saturation process linked to (3 and applied on E+ and E“). 

Find: a hypothesis H (disjuncts of C-CLASSIC^j terms), such that H is complete 
w.r.t. E+’ and consistent w.r.t. E“’. 

4.1 Background Knowledge 

The background knowledge f3 is composed of two sets of rules (C and D are 
terms of C-CLASSIC^e): a set De/of default rules in the form C — D meaning 
that if a concept is subsumed by C, it is generally subsumed by D, together with 
a set Inc of strict incoherence rules in the form C — *■ T meaning that if a concept 
is subsumed by C, it is incoherent. More precisely, the definition of De/and Inc 
are the following: 



The syntax of these rules is defined in section 4.1. 
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Definition 3 (Def) Defis composed of m rules called R\,. . .,Rm such that Ri = 
preconditiorii Conclusiorii where preconditiorii is a term of C-CLASSICF and 
Conclusioui a term of C-CLASSIC where the only allowed concept conjunctions 
are in the value restriction of roles. 

Definition 4 (Inc) Inc is composed of n rules called Ri,. . .,i?„ such that Ri = 
preconditioni —> _L where preconditioni is a term of C-CLASSIC. 

A rule of Def or Inc is applicable if its precondition subsumes the example. 



4.2 Saturation Process 

One of the main operations of the saturation process is to detect a potential 
incoherence between the definition of an example and the conclusion of an ap- 
plicable default rule in order to add a default property (no incoherence has been 
detected) or an excepted property (an incoherence has been detected) to the 
example. 

We distinguish two kinds of incoherences: incoherences of type 1 and incoher- 
ences of type 2. 

An incoherence of type 1 corresponds to an incoherence linked to one or more 
general axioms concerning the connectives of the language. For instance, child 
AT-LEAST 2 n child AT-MOST 1 is incoherent and more generally for all role 
R, R AT-LEAST m n R AT-MOST n is incoherent if m > n. These axioms 
are expressed in the equational system of C-CLASSIC [23] . 

An incoherence of type 2 corresponds to an incoherence linked to a rule belong- 
ing to Inc (e.g. Inapt-to-fiy □ Flies is incoherent). 

It must be highlighted that in our framework incoherences are only linked to 
strict knowledge. Indeed, a default property cannot be incoherent. This is the 
reason why when there is no conflict between the conclusion of the default rule 
and the current description of e’ we add ^Conclusion rather than Conclusion 
which could later be in conflict with knowledge issued from other rules. Besides, 
an excepted property never leads to an incoherence since an exception to a con- 
cept does not correspond to a negation of this concept. For instance, Fly*^ □ Fly 
is not incoherent. 

The fact that incoherences are linked only to strict knowledge has two impli- 
cations. First, it allows us to be sure that the addition of default and excepted 
properties will not further lead to incoherences. This guarantees the monotonic- 
ity of the extension process. Besides, we can state that a term T of C-CLASSICie 
is incoherent if and only if the term T’ equivalent to T without default and ex- 
cepted properties is incoherent. Thus, incoherence of type 1 can be detected by 
translating a term of C-CLASSIC^e into a term of C-CLASSIC (i.e. by removing 

^ We could extend the process by using terms of C-CLASSIC^e in the theory but our 
goal is to show that we can obtain default and excepted properties from rules whose 
precondition and conclusion are described using strict properties. 
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default and excepted properties) and by applying the normalization procedure 
defined for C-CLASSIC in [23]®. 

The sketch of the extension algorithm is the following. Let e be an example de- 
scribed by a term C of C-CLASSICie, for each rule of He/ we check whether C is 
subsumed by the premisse of the rule. In order to achieve this task default and 
excepted properties of C are removed and C’ the term of C-CLASSIC obtained is 
compared with the premisse of the rule by applying the subsumption algorithm 
designed for C-CLASSIC. If the premisse subsumes C’, we check if the conclu- 
sion of the default rule is incoherent with C’ (i.e. if it leads to incoherences of 
type 1 or 2) . Incoherences of type I are detected by computing the normal form 
of C’ n Conclusion using the normalization algorithm of C-CLASSIC terms. If 
this normal form is equivalent to the denotation of _L, there is an incoherence. 
Incoherences of type 2 are detected by verifying whether a premisse of a rule 
belonging to Inc subsumes C’ □ Conclusion. If any incoherence is detected the 
conclusion is excepted and added to C (i.e. Conclusion'^ is added to C) otherwise 
the conclusion of the rule by default (i.e. <5Conclusion) is added to C. 

The extension algorithm is as follows : 

Inputs: a term C of C-CLASSIC^e, a set Def ={Ri,. . .,Rn} of “default rules”, a 
set Inc = {R’l,. . .,R’m} of strict incoherence rules. 

Output: ENF-C the extended normal form of C. 

External procedures used: 

Remove<5e(d): transforms a term d of C-CLASSICae into a term of C-CLASSIC 
by removing default and excepted properties of d (since incoherences concern only 
strict properties). 

Subsume(C,D): returns true if C subsumes D, C and D being two terms of C- 
CLASSIC. 

NF’(d): computes the normal form of a term d of C-CLASSIC. 

BEGIN 

C’ ^ Remove<5e(C) 

For all Ri of Def such that Premissei — » Conclusioni and Subsume(Premissei ,C’) 

begin 

Add <— false {* Add is true if an excepted property has been added *} 

* Search for incoherences of type 1 

if NF’(C’ n Conclusioni) = T {* Conclusioni is incoherent with the description 

of C’ and C *} 

then begin 

C <— C n (Conclusion i)" 

Add <— true 
end 

* Search for incoherences of type 2 

if not Add then if there exists in Inc a premisse D such that Subsume(D,C’ 

n Conclusioni) 



Applying this procedure on an incoherent term leads to normalize the term by T 
which denotes incoherences. 
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then C ^ C n (Conclusioni)'^ else C <— C n 5Conclusioni 

end For all 
END 

Example 

We consider example 1 described in section 3. 

Let (3 a background knowledge made of two sets Inc and Def. 

Def = {Rl: Animal □ Has-teats -^d Viviparous, R2: Animal □ Has-Wings -^d 
Flies, R3: Animal □ Lives-in-the-sea n Scales — Gills } is a set of default rules 
meaning that generally animals having teats are viviparous, that generally an- 
imals having wings fly and that generally animals with scales and living in the 
sea have gills. 

Inc = {R4: Viviparous n Oviparous — > _L, R5: Inapt-to-fiy n Flies _L} is a set 
of strict rules respectively meaning that an example can not be both oviparous 
and viviparous and that it is impossible to fly and to be inapt to fly. We illustrate 
now the saturation process on the example 1. 
e’l = ei 

e’2 = e2 n Viviparous'^ 
e’3 = ea 

e’4 = e4 n Flies'^. 
ce’i = cei n S Gills. 

Some explanations about the saturation of 02 : 

e2 verifies (i.e. is subsumed by) the precondition of Rl. The addition of Viviparous 
to e2 leads to an incoherence ( Viviparous n Oviparous is subsumed by _L from 
R4). The property Viviparous" is added to 02. Adding this property makes it 
possible to highlight that a part of 02 (Oviparous) is incoherent with Animal 
n Has-teats — Viviparous. This information can be useful during the learning 
process described in the next section. 

4.3 Learning in C-CLASSIC^e vs. C-CLASSIC 

Using the example 1 and the background knowledge described in the previous 
section, we show now that, given the same positive and negative examples, C- 
CLASSIC^e allows to learn disjunctive concepts represented with less disjuncts 
than concepts learned in C-CLASSIC. 

LCSLEARNDISJ is applied on the saturated examples. The first disjunct learned 
by the algorithm is LCS(e’i,e’2) (i.e. Animal □ SViviparous^ □ Vertebrate). 
The examples e’l and e’2 are then removed from the learning set. The next 
learned disjunct is LCS(e’3,e’4) that covers these two positives examples and no 
negative examples. The algorithm returns the following hypothesis: (Animal □ 
SViviparous □ Vertebrate) V (Animal □ 6 Flies □ Vertebrate). 

The following instance: e = Animal □ Vertebrate □ Lives-in-Australia □ Wings 

® Note that this property does not belong to the LCS computed from the C-CLASSIC 
definitions of ei and 02. Now, this property is crucial since it prevents the nega- 
tive example to be subsumed (let us remind that ce’i has the properties Animal H 
Vertebrate). 
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n Big-feet □ Inapt-to-fly is recognized by the definition learned in C-CLASSIC^e: 
the saturation process adds Flies'^ to e and Animal □ SFlies □ Vertebrate sub- 
sumes the saturated instance. Note that e is not recognized by the definition 
learned in C-CLASSIC (see section 3). 

5 Related and Further Work 

Cohen and Hirsh suggest that learning systems based on description logics may 
prove to be a useful complement to ILP systems. One issue is the investigation 
of combining our framework and non-monotonic frameworks in ILP. In [9], the 
authors present a framework for learning non-monotonic logic programs. Hence 
given a background theory and a set of examples they generate a hypothesis 
within the language bias of a subclass of non-monotonic logic programs^° that 
covers all the positive examples and none of the negative examples. In such the- 
ories in order to decide if an atom, A, holds they need to show that A can be 
derived classically using some rule, r, for A and that ^A can not be derived 
classically using some rule r’ which is designated higher than the rule r by the 
priority relation on the program. 

For instance, consider the background theory B: 
hird(x) <— penguin (x) 
penguin (x) <— superpenguin (x) 

bird(a), bird(b), penguin(c), penguin(d), superpenguin (e), superpenguin(f) 
Consider also the set of examples E = FA U E~ where EF' = {flies(a), flies(b), 
flies(e), flies(f)} and E~ = {flies(c), flies(d)}. 

The result of the algorithm is the hypothesis H : 

R1 : flies(x) ^ bird(x) 

R2 : ^ flies(x) ^ penguin(x) 

R3 : flies(x) ^ superpenguin(x) 

where R1 has lower priority than R2 and R2 has lower priority than R3. 

In such a non-monotonic framework, the goal is to learn strict predicates (e.g. 
Fly whose penguin is a negative example) by generating default rules. This ap- 
proach is not suited to learn strict concepts having default properties in their 
definition (e.g. Bird whose penguin is a positive example despite the fact it is ex- 
ceptional w.r.t. the Fly property). A further work could consist in using default 
rules learned in this non-monotonic framework (e.g. Rl, R2, R3^^) in order to 
improve learning in C-CLASSIC^e (for instance, positive and negative examples 
of the concept Bird could be saturated with 5{Flies^) or {Flies'^Y using Rl, R2 
and R3). Note that the problem of learning with a non-monotonic background 
knowledge is one of the possible directions for further research listed in [9]. 
Abduction [10,13] also is the basis for non-monotonic learning frameworks by 

Theories where their set of contradictory rules can be separated into classes where 
the rules in each class are totally ordered by the priority relation of the theory. 

The presence of is not a problem since it is straighforward to add the negation 
on atomic concepts in C-CLASSIC^e (and the axiom A fl -lA = T in the equational 
system in order to take into account such incoherences. 
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providing a uniform technique to handle negation as failure, incomplete predi- 
cates and integrity constraints. We need also to compare our work with these 
approaches. 

As a conclusion, in this paper we have defined a problem setting concerning 
learning concept in a framework combining a description logics allowing to define 
concepts with default and excepted properties, and a background knowledge 
represented by rules and default rules. We proposed a prior saturation of the 
examples using the background knowledge and we showed that learning from 
extended examples can lead to the construction of a more satisfactory learned 
concept. More precisely, the learned concepts are smaller in size (they have 
less disjuncts) and they are more general covering more examples which can be 
identified as belonging to the target concept. The presence of defaults is a way 
to improve the expressive power of the DL (few concepts can be defined with 
necessary and sufficient properties using only strict knowledge) and therefore 
to improve the learning process. Finally, note that the application of default 
rules is difficult since it can lead to ambiguities. For instance, in [19] the authors 
integrated defaults in DLs using incident rules of the form cl — c2 meaning 
“whenever an object is an instance of cl it is also an instance of c2 unless this is 
in conflict with some other piece of knowledge” . This approach requires to handle 
multi extensions by defining preference criteria (e.g. the preferred models contain 
the most specific knowledge or the most applied defaults). In our framework, the 
connectives 6 and e allow us to avoid this problem. 
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Abstract. In a combinatorial auction problem bidders are allowed to 
bid on a bundle of items. The auctioneer has to select a subset of the 
bids so as to maximize the price it gets, and of course making sure that 
it does not accept multiple bids that have the same item as each item 
can be sold only once. In this paper we show how the combinatorial 
auction problem and many of its extensions can be expressed in logic 
programming based systems such as Smodels and dlv. We propose this 
as an alternative to the standard syntax specific specialized implemen- 
tations that are much harder to modify and extend when faced with 
generalizations and additional constraints. 



1 Introduction and Motivation 

In a simple auction several bidders bid for an item and the auctioneer selects the 
highest bid. Often bidders need a bundle of items, where the worth of the whole 
bundle - to the bidder - may be more than the sum of the individual worth of 
each item in the bundle. For example, let A and B be two adjacent real estate 
plots. A single developer can often make more money developing both plots 
together than two different developers developing A and B separately without 
co-operating with each other. This happens if say both A and B are needed to 
create a lucrative golf course while A and B separately can only be used for less 
profitable purposes. The opposite may be true in some cases too. The cases that 
are often mentioned with regards to both are airport landing slots [9] , bandwidth 
auctions, real estate auctions, and transportation exchanges [11]. 

In such cases participating in parallel or sequential auctions for each items 
in a bundle desired by a bidder is risky as the bidder may not win all items 
in the bundle. Moreover it would be difficult for him to individually price each 
item in the bundle. One way to avoid such problems is to have combinatorial 
auctions where bidders are allowed to bid on bundles. Although this is good for 
the bidder, the seller’s problem of deciding which bids to accept becomes harder, 
as different bidders can make up their own bundle on which they bid on . 

Recently, there has been a lot of interest in this problem because of its 
applicability in Internet based auctions, B2B exchanges, and multi-agent sys- 
tems [16,7]. There have been several papers [12,13,2,4,6,15,10] that analyze this 
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problem and present algorithms and techniques to solve it and a few of its gener- 
alizations. One starting point that guides research on this is the result from [10] 
which shows the problem of finding the optimal set of bids (that maximize the 
seller’s take) to be NP-Complete. 

So far there are three different approaches for solving this problem: complete 
algorithms [12,2,10] that find an optimal solution in the general case, incom- 
plete methods [4] that find high quality solutions quickly, and identification of 
tractable special cases and algorithms for those cases [15,13]. The other possible 
approach of finding approximation algorithms is blocked by the result from [12] 
that shows that no polynomial algorithm can guarantee a solution that is close 
to optimal. 

In this paper we follow the first approach of obtaining optimal solutions in the 
general case. Our methodology is different from the earlier approaches [12,2,10] 
in that we would like to represent the problem in a declarative knowledge repre- 
sentation language such that optimal ‘models’ of the representation correspond 
to optimal solutions. This is similar to the methodology of satisfiability based 
planning [5] where the planning problem is represented as a propositional the- 
ory, and each model of this theory encodes a plan. The main motivation behind 
our approach of using a declarative knowledge representation language is that 
we would like the process of adding additional constraints, or making a gener- 
alization to be easier. This differs from the other approaches [12,13,2,6] where 
major changes were needed to move from single unit combinatorial auctions to 
multi-unit combinatorial auctions. Also, as mentioned in [13] additional gener- 
alizations necessitates change in the code, which requires the knowledge of the 
structure of the code and hence can only be done by people adequately familiar 
with the original code. In contrast we will show that when using a declarative 
knowledge representation language additional generalization, or addition of new 
constraints often leads to adding a few extra rules, without needing the detailed 
knowledge of the original code or its structure. 

The declarative language that we will use throughout this paper is Smod- 
els [8,14]^, an extension of logic programming with answer set semantics [3]. It 
has new constructs such as cardinality and weight constraints, and optimiza- 
tion statements. It is preferable over propositional logic as it is more expressive 
in terms of being able to express transitive closure, causality, and aggregation. 
Moreover, it is a non-monotonic language and hence more suitable for knowledge 
representation and finally it includes optimization statements. (A more detailed 
argument about the advantages of logic programming with answer set semantics 
over other logics is given in the draft of a book by the first author available at 

^ Strictly speaking, Smodels is a system that started of as implementing the answer 
set semantics of logic programs and now has several new constructs. By the Smodels 
language we refer to the extension of logic programs that is used by the Smodels 
system. 

We would like to mention that some of the encodings in this paper can also be 
expressed in the language of the dlv system [1]. Due to lack of space we only focus 
on the Smodels system. 
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http://www.public.asu.edu/~cbaral/.) Smodels is preferable over ILP (integer 
linear programming) because it can represent logical specifications more easily. 
Although ILP can accommodate propositional logic, it has not been shown how 
it can accommodate non-monotonic features of a logic program. 

Our goal in this paper is to show how single unit and multi unit combinatorial 
auction problems can be specified and declaratively solved using Smodels, and 
how it is easy to add additional constraints and further generalizations to the 
original problem using Smodels. We hope this representation will serve as a 
benchmark to the logic programming, knowledge representation and declarative 
problem solving communities to develop more efficient implementations of the 
Smodels language such that the timing of obtaining solutions of combinatorial 
auction problems specified in Smodels is comparable to that of the specialized 
algorithms/programs in [12,13,2,6]. 

2 Background: The Smodels Language 

A logic program is a collection of rules of the form 

QiQ < j ■ ■ • j Oim 1 not Um+I, ■ ■ ■ ,not an (1) 

where a/s are atoms. For an atom a, “not a” is referred to as a naf-literal. 
Intuitively, the above rule means that if oi . . . am are true and am+i • ■ ■ On can 
be assumed to be false then oq must be true. Logic programs whose rules do not 
have not in the body - referred to as definite programs - have unique answer 
sets, which are the least models of the theory obtained by treating rules of the 
form tto ^ oi, . . . , am as the classical formula ai A . . . A am A oq. Given a logic 
program P and a set of atoms S, the Gelfond-Lifschitz transformation is 
defined as the set of rules obtained from P by removing all rules from P whose 
body contains not b such that b € S, and then removing the naf-literals from 
the rest of the rules. A set S of atoms is said to be an answer set of a logic 
program P if S is the answer set of the definite program P^ . 

In the Smodels language, each of the uq, . . . , am can be replaced by cardinality 
expressions and weight expressions. An example of a cardinality expression is: 

3 {sold{X) : item{X)} 6 

which is true in an answer set if the number of items that are sold is between 
(inclusively) 3 and 6. We can encode the value each item is sold by a weight 
declaration of the form: 



weightso/d(a) = 8. 

which would mean that item a was sold for $8. Now the weight expression 

23 [sold{X) : item{X)] 36 

will be true in an answer set if the total value of those items that are sold is 
between (inclusively) 23 and 36. 
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Optimization statements are syntactically similar to weight and cardinal- 
ity expressions except that the left and right range are replaced by the label 
maximize or minimize in the left hand side. For example, if we want to obtain 
the answer sets where the number of items that is sold is maximum then we 
need to write the following: 

maximize {sold{X) : item{X)}. 

Smodels allows multiple optimization statements and treats them as a compound 
optimization through a lexicographic ordering 

among the optimizations statements. A more formal characterization of the 
Smodels language is given in [14]. 

3 Single Unit Combinatorial Auction 

We explain the single unit combinatorial auction problem through an exam- 
ple. The auctioneer has the set of items {1,2, 3, 4}, and the buyers submit bids 
{a, b, c, d, e} where a constitutes of ({1, 2, 3}, 24), meaning that the bid a is for the 
bundle {1, 2, 3} and its price is $24. Similarly b constitutes of ({2, 3}, 9), c consti- 
tutes of ({3,4}, 8), d constitutes of ({2, 3, 4}, 25), and e constitutes of ({1,4}, 15). 
The winner determination problem is to accept a subset of the bids with the 
stipulation that no two bids containing the same item can be accepted, so as to 
maximize the total price fetched. We now present an Smodels encoding (which 
is both a specification and a program.) of this example. 

3.1 Specifying the Domain 

1. We specify the bid names and their values as follows: 

bid(a). weight sel(a) = 24. bid(b). weight sel(b) = 9. bid(c). weight sel(c) = 8. 
bid(d). weight sel(d) = 25. bid(e). weight sel(e) = 15. 

2. We specify the items as follows: item(1..4). 

3. We specify the composition of each bids - in terms of what items it consists 
of, as follows. 

in(l,a). in(2,a). in(3,a). in(2,b). in(3,b). in(3,c). in(4,c). 

in(2,d). in(3,d). in(4,d). in(l,e). in(4,e). 

3.2 The General Rules 

We have the following general rules which together with the domain specific rules 
from the previous subsection, when run using Smodels will give us the winning 
bids. 

1. The following two rules label each bid as either selected or not selected. 
sel{X) <— bid{X), not not_sel{X). 
not^sel(X) <— bid(X),not sel{X). 

They can be replaced by the following single Smodels rule: 

{seZ(X)} ^ bid{X). 
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2. The following enforces the constraint that two different bids with the same 
items can not be both selected. 

<— sel{X), sel{Y), X y^Y, in(I, X),in{I, Y). 

The above Smodels rule does not follow the syntax of rules in Section 2. 
Such rules of the form 
< Ui, ■ • ■ ; riot . . . j Tiot Cln- 

with empty head mean that there can not be answer sets that evaluate the 
body true. Such rules can be thought of as the following rule, where / is a 
new atom, that satisfies the syntax of (1) in Section 2. 
f ^ not y, j ■ ■ ■ 5 : not , . . . , not On . 

3. The following optimization statement specifies that we must select bids such 
that their total price is maximized. 



maximize [sel{X) : bid{X)]. 

When we run the above program in the Smodels system using the command 
Iparse aucl.sm | smodels 0 

the system first outputs the answer set {a}, and then outputs the optimal 
answer set {d}, indicating that the latter has a higher total price. 



3.3 Formal Characterization 

In a combinatorial auction (single unit case), the auctioneer has m items M = 
{!,..., m} and the buyers submit n bids B = {Bi, , Bn}, where each bid is 
a tuple Bi = {Si, Pi), with Si C M, and pi is a price. The winner determination 
problem [13] is an assignment of bids as accepted (xi = 1) or not (xi = 0), for 
1 < i < n that satisfies the constraint 

n 

( a;i) < 1 for 1 < j < m; and maximizes ''^^pi x Xi. 

l<i<n,j£Si i—1 

The above characterization can be related to the Smodels encoding as follows: 

Theorem 1. For a single unit combinatorial auction problem with integer 
prices, each solution to the winner determination problem corresponds to an 
optimal answer set of the encoding described in 3. 1-3.2 and vice-versa. 

3.4 Encoding in dlv 

The dlv system [1] is also an implementation of an extension of logic program- 
ming with additional constructs. It allows disjunctions in the head of rules and 
captures the second level of polynomial hierarchy. Among its additional con- 
structs are weak constraints which are of the form: 



:~ pi, . . . ,Pm, not qi, . . . , not q„.[weight : level] 
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Given a program with weak constraints its best answer sets are obtained 
by first obtaining the answer sets without considering the weak constraints and 
ordering each of them based on the weight and priority level of the set of weak 
constraints they violate, and then selecting the ones that violate the minimum. 
In presence of both weight and priority level information, the minimization is 
done with respect to the weight of the constraints of the highest priority, then 
the next highest priority and so on. 

The encoding in Section 3.1 and 3.2 can be alternatively encoded in dlv^ as 
follows: 

1. We have the bid atoms from part 1 of Section 3.1, and in atoms from part 3 
of Section 3.1. 

2. We can either have the rules in part 1 of Section 3.2 or use disjunction and 
have the rule: sel{X) V not_sel{X) ^ bid{X). 

3. We have the constraint in part 2 of Section 3.2. 

4. Finally instead of the optimization statement in part 3 of Section 3.3, we 
have the following weak constraints. 

:~ not sel{a).[24 : 1] 

:~ not sel{b).[9 : 1] 
not seZ(c).[8 : 1] 
not sel(d).[25 : 1] 
not sel{e).[15 : 1] 

4 Combinatorial Auction with CNF Bids 

In this section we show how the single unit combinatorial auction specification 
can be generalized such that a bidder can specify some options between his bids. 
For example a bidder may want to specify that only one of his bids g and h be 
accepted, but not both. This can be generalized further to such that a bidder 
can specify a CNF^ bid [4] which is a conjunction of (ex-or) disjunction of items 
such that one item from each of the conjuncts is awarded to the bidder. (An 
alternative way to achieve this is by opening up the CNF to several bids and 
adding a dummy [2] item to each of the bids so that exactly one of them is 
selected.) As before we show our encoding with respect to an example. 

1. We will have the domain specification as in part 1 and 2 of Section 3.1 and 
the general rules in part 1 and 3 of Section 3.2. 

2. Recall that a CNF bid is not a bundle of items, rather it could be of the 
following form: a = {gl © hi) A {g2 © h2) A (^3 © fi3) 

which means that the bid a can be satisfied by granting one of the items gl 
and hi, one of the items g2 and h2 and one of the items g3 and h3. We can 
represent this in Smodels as follows: 

^ In the future we plan to compare the timings using the dlv system with the timings 
using the Smodels system. 

® Although the use of CNF is somewhat misleading, we use it to be consistent with 
the original terminology in [4]. 
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conj(cl, a). disj(gl, cl). disj(hl, cl). 
conj(c2, a). disj(g2, c2). disj(li2, c2). 
conj(c3, a). disj(g3, c3). disj(li3, c3). 

This will replace the representation in part 3 of Section 3.1. 

3. Now it is not enough to just label bids as selected or un-selected. After 
labeling a bid as selected we must identify which items are granted as part 
of that selected bid. We have the following rules to encode that. 

other _granted{X^ C, G) ^ granted{X, C, G"), G' fy G. 
granted{X,G,G) ^ sel{X), conj{C, X)^ 

disj{G, G), not other _granted{X, C, G). 
Intuitively, granted{X , G, G) means that as part of the selection of bid X, 
to satisfy the conjunct G, item G is granted; and other _granted{X, C, G) 
means that some item other than G has been granted. The above two rules 
ensure that for any selected bid X, and its conjunct G exactly one item in 
that conjunct is granted in each answer set. 

4. Because of the difference between a CNF bid and a simple bid consisting of 
a bundle, part 2 of Section 3.2 needs to be replaced by the following rule, 
so as to enforce that we should not select two bids and grant the same item 
with respect to both. 

<— bid{X),bid{Y), granted{X, C, G),granted{Y,G' ,G), X fy Y. 

5 Multi-unit Combinatorial Auction 

Multi-unit combinatorial auction is a generalization of the single unit case, where 
the auctioneer may have multiple identical copies of each item and the bids may 
specify multiple units of each item. The goal here is same as before: to maximize 
the total price that is fetched; but the condition is that the bids should be 
selected such that for any item the total number that is asked by the selected 
bids should not be more than the number that is originally available for that 
item. As before, we describe our Smodels encoding with respect to an example: 
first the specification for a particular domain, and then a set of general rules. 

5.1 Specifying the Domain 

1. The bid names and their values are specified as in part 1 of 3.1. 
bid(a). weight sel(a) = 23. 

bid(b). weight sel(b) = 9. 
bid(c). weight sel(c) = 8. 
bid(d). weight sel(d) = 25. 
bid(e). weight sel(e) = 15. 

2. We specify the items and their initial quantities as follows: 
item(i). item(j). item(k). item(l). 

limit(i,8). limit(j,10). limit(k,6). limit(l,12). 

3. We specify the composition of each bid as follows: 
in(i,a,6). in(j,a,4). in(k,a,4). 
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Intuitively, the above means that, bid ‘a’ is for 6 units of item ‘i’, 4 units of 

item ‘j’, and 4 units of item ‘k’. 

in(j,b,6). in(k,b,4). 

in(k,c,2). in(l,c,10). 

in0,d,4). in(k,d,2). in(l,d,4). 

in(i,e,6). in(l,e,6). 



5.2 The General Rules 

We have the following general rules which together with the domain specific rules 
of the previous subsection, when run using Smodels will give us the winning bids. 

1. The following two rules label each bid as either selected or not selected. 
sel{X) <— bid{X),not not_sel{X). 

not^sel(X) <— bid(X),not sel(X). 

2. The following rule defines selJ,n{I, X, Z), which intuitively means that bid X 
is selected, and Z units of item I is in bid X . 

seUn{I, X, Z) <— item{I), bid{X), sel{X), in{I, X, Z). 

3. The following weight declaration assigns the weight Z to the atom 
seljin(X, Y, Z). weight seUn(X, Y, Z) = Z. 

The above weight assignment is used in the next step to compute the total 
quantity of each item in the selected bids. 

4. The following rule enforces the constraint that for each item, the total quan- 
tity that is to be encumbered towards the selected bids must be less than or 
equal to the initial available quantity of that item. 

<— Y'[seldn{I ^ X, Z) : bid{X) : num{Z)\^ item{I), Y)^Y' = Y -|- 1. 

5. As before we have the following optimization statement. 
maximize [sel{X) : bid{X)]. 

When the above program is run through Smodels using the command 
Iparse file.sm | smodels 0 

the system first outputs the answer set {sel{d), sel{a)}, and then outputs 
another answer set {se/(e), sel{d), sel{b)} and mentions that the latter one is 
optimal. 

Thus the Smodels system starts off with a sub-optimal solution, and keeps 
giving better and better solutions until an optimal solution is found. We refer to 
this as exhibiting a weak anytime behavior as after the first solution is found, a 
user may interrupt the system at any time and get a sub-optimal solution which 
improves with time. Since there is no guarantee that the first solution will be 
found within a certain time bound we use the qualifier ‘weak’ with the adjective 
‘anytime’. 

5.3 Formal Cheiracterization 

In the multi-unit case, the auctioneer has Uj units of each item j, 1 < j < m, 
and each bid Bi is of the form ((A^, . . . , X^),pi), where Xj denotes the number of 
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units of item j that is part of the bid Bi. In this case, the winner determination 
problem [13] is an assignment of bids as accepted (xi = 1) or not (xi = 0), for 
1 < i < n that satisfies the constraint 



Theorem 2. For a multi-unit combinatorial auction problem integer prices, 
each solution to the winner determination problem corresponds to an answer 
set of the encoding described in 5. 1-5.2 and vice-versa. 

6 Combinatorial Exchanges 

A combinatorial exchange is a further generalization, where we have buyers and 
sellers. The buyers bid as before, while the sellers offer their items for a price. The 
job of the exchange is to accept a subset of the bids of the buyers and sellers such 
that it maximizes the surplus (the amount it obtains from the buyers minus the 
amount it has to pay to the sellers), subject to the condition that for each item, 
the total number it obtains from the selected seller bids is more than what it has 
to give in lieu of the selected buyer bids. Note that the maximization condition 
guarantees that the exchange does not lose money outright. This is because by 
not accepting any bids the surplus will be zero. So when the exchange accepts 
some bids its surplus would have to be positive. We now describe our Smodels 
encoding for multi-unit combinatorial exchanges through a slight modification 
of the example in Section 5. The modification is that instead of specifying the 
initial quantity of each item, we create a seller /, who offers those quantities for 
a price. 

1. We have part 1 and part 3 of Section 5.1 and only the items listing of part 2 
of 5.2. We do not have the description of the initial quantity for the items. 
Instead the bid for the seller / is specified as follows: 



A sellers bid is distinguished from a buyers bid by having a negative price 
for the whole bid (meaning the seller wants money for those items, instead 
of being ready to give a certain amount of money), and similarly the atom 
in{i, /, —8) means that the seller / has 8 units of item i to sell, while in{i, a, 6) 
would mean that the buyer a wants to buy 6 units of item i. 

2. We have part 1, 2, 3, and 5 of Section 5.2 and we replace part 4 by the 
following rule. 



n 



n 




for 1 < j < m; and maximizes 




bid(f). weight sel(f) = -50. 

in(i,f,-8). in(j,f,-10). in(k,f,-6). in(l,f,-12). 



Y [seljin{I , X , Z) : bid{X) : num{Z)] Y,item{I),Y > 0. 
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The above rule enforces the constraint that for each item I, the total number 
encumbered with respect to the selected buyer bids should be less than or 
equal to the sum that is available from the selected seller bids. Note that 
the use of the same variable Y as the upper and lower bound of the weight 
constraint serves the purpose of computing the aggregate. 

3. Although the following rule is normally not necessary, as it is taken care of 
by the maximize statement, by having it we can exploit the weak anytime 
behavior. It also eliminates selections, where the exchange may lose money, 
earlier in the process. 

^ Y[sel{X) : bid{X)]Y, F < 0. 



When we run the above program through Smodels it tells us to not select any 
bids. This is expected because the maximum amount that can be obtained from 
the buyers is $49 by selecting b, d and e; but to satisfy that we have to select f, 
which costs $50, resulting in a net loss to the exchange. On the other hand if 
we change our example, and assign -45 as the weight of sel(f), then the Smodels 
output is indeed to select b, d, e, and f. 

6.1 Formal Characterization 

In case of a combinatorial exchange, instead of a single auctioneer, we have many 
sellers, who also present bids, but in their bids the A^s and piS are negative 
numbers denoting the fact that they want to sell (instead of buy) those items 
and they want to be paid (rather than they are willing to pay). Here the winner 
determination problem [13] is an assignment of bids as accepted (xi = 1) or not 
(li = 0), for 1 < z < n that satisfies the constraint 



n n 

X Xi) < 0 for 1 < j < m; and maximizes E Pi X Xi- 

i=l i=l 

Theorem 3. For a multi-unit combinatorial exchange problem with integer 
prices, each solution to the winner determination problem corresponds to an 
optimal answer set of the above encoding and vice-versa. 

7 Expressing Additional Constraints 

In this section we show how further generalizations and additional constraints 
can be easily expressed in Smodels. 

But the Smodels requirement of having a domain variable for Y (not shown in the 
above rule) makes it an inefficient way to compute aggregation. Having an efficient 
computation of aggregates together with the answer set semantics remains a chal- 
lenge. 
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1. Suppose we would like to express the constraint that item 1 must be sold. 
We can achieve this by adding the following rules: 

sold{X) ^ item{X),bid{Y), sel{Y),in{X, Y). 

^ not sold{l). 

2. Suppose we would like to have reserve prices® in the single unit combinatorial 
auction. This can be encoded by the following modification of the program in 
Section 3. The main change is that we replace bid{X) by bid{X, Y) where Y 
was originally the weight of bid{X). This change allows us to compare the 
sum of the reserve prices of the items in a bid with the bid price, which now 
is the parameter Y instead of the weight of bid{X). 

As regards to the specification of the domain, the bids are specified as follows: 
bid(a,24). bid(b,9). bid(c,8). bid(d,25). bid(e,15). 

The composition of items and bids are as in part 2, and 3 of 3.1. The general 
rules, as described below are different from the ones in 3.2. 

(a) The following two rules label each bid as either selected or not selected. 
The third rule assigns a weight to sel{X, Y). 

sel{X, Y) ^ bid{X, Y), not not_sel{X, Y). 
not_sel{X, Y) <— bid{X, Y), not sel\x^ Y). 
weight sel{X^ Y) = Y. 

(b) The following enforces the constraint that two different bids with the 
same items can not be both selected. 

<— sel{X, N), sel{Y, N'),X yf T, item{I), in{I, A), m(/, Y). 

(c) We have the following optimization statement. 
maximize \sel{X,Y) : bid{X,Y)]. 

(d) We express the reserve price of each item by the following: 
rp(l,2). rp(2,8). rp(3,8). rp(4,12). 

(e) The following rules compute the sum of the reserve prices of bids and 
compare them with the bid price and eliminate possible answer sets 
where the bid price of a selected bid is less than the sum of the reserve 
prices of items in that bid. 

injrp{Item, Bid, Res-pr) <— in{Item, Bid),rp{Item, Resjpr). 
weight in_rp(Item, Bid, Reszpr) = Resjpr. 
item_num{X,Y) <— item{X) , num(Y) . 

<— C [injrp(Item, Bid, Res-pr) : itemjnum{Item, Resjpr)] C, 
bid{Bid, Bidjpr), sel{Bid, Bidjpr), Bidjpr < C. 

3. Suppose we would like to have a constraint that item 1 and 3 must not go 
to the same bidder. In the simple case if we assume that each bid is by a 
different bidder we can encode this by the following rule. 

<— bid{X, Y), sel{X, Y), in{\, X),in{3, X). 

® In simple auctions reserve price of an item is the minimum price a seller would accept 
for that item. Its extension [13] to combinatorial auctions will become clear below. 
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4. In the more general case where each bid has an associated bidder we first 
need to express this association as follows: 

bidder(a, John). bidder(b, mary). bidder(c, John). 
bidder(d,mary). bidder(e, peter). 

Next we need the following rules: 

goes-to{Item, Bidder) <— in{Item, X) ^bidder {X, Bidder)^ sel{X,Y). 
goes J.o{l,B), goes J.o{'i,B). 

Similarly, if we want to specify that the items 1 and 3 must go to the same 
bidder, then the last rule can be replaced by the following rules. 

<— goesJ.o{l, B), not goesJ,o(3, B). 

<— goes-to(3, B), not goes_to(l, B). 

5. Suppose we would like to represent the constraint that every bidder must 
return happy, i.e., at least one of her bid must be satisfied. This can be 
expressed by the following: 

happy {Bidder) <— bidder{X, Bidder)^ bid{X, K), sel{X, Y). 

<— bidder{Bid, Bidder), not happy {Bidder). 

6. Suppose the seller wants to only deal with whole sellers. I.e., it wants to 
have the constraint that it only selects bids of a bidder if the total money 
to be obtained from that bidder is more than $100. This can be achieved by 
adding the following rules. 

sel{Bid, Value, Bidder) <— bid{Bid, Value), sel{Bid, Value), 
bidder{Bid, Bidder), weight sel{Bid, Value, Bidder) = Value. 
total{Bidder,C) ^ C [sel{Bid,V alue, Bidder) : bid{Bid, Value)] C. 

^ total{Bidder,C),C < 100. 

7. Suppose the seller wants to avoid bid ‘a’ as it came late, unless it includes 
an item that is not included in any other bids. This can be expressed by the 
following rules. 

ow-Covered{Bid, Item) <— in{I tern. Bid'), Bid ^ Bid! . 

not.ow -Cover ed{Bid) ^ in{Item, Bid), not ow -cover ed{Bid, Item). 

<— sel{a. Value), not not-ow -cover ed{a) . 

8. To check inventory costs the seller may require that no more than 5 unsold 
items should be left after the selection. This can be expressed by the following 
rules. 

sold{I) <— item{I), bid{X, Y), sel{X, Y),in{I, X). 
unsold{I) <— item{I),not sold{I). 

<— C {unsold{I) : item{I)} C,C >5. 

9. To contain shipping and handling costs the seller may require that bids 
should be accepted such that at least 5 items go to each bidder. This can be 
expressed by the following rules. 

count{Bidder,C) <— C {goes-to{Item, Bidder) : item{Item)} C, 
bidder{B, Bidder). 

<— bidder {B, Bidder), count{Bidder,C),C < 5. 

10. If item ‘a’ is a family treasure the seller may require that it can only be sold 
to bidder John or mary, his relatives. This can be expressed by the following 
rule. 

^ goes-to{a, X), X ^ john,X ^ mary. 
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The above shows how additional constraints and generalizations can be easily 
expressed as new Smodels rules and often we do not have to change the original 
program, but just have to add new rules. 

8 Conclusion 

In this paper we showed® how the combinatorial auction problem and its general- 
izations can be expressed and solved using the declarative knowledge representa- 
tion language of Smodels. We argued that the declarativeness of Smodels allows 
us to easily make generalizations and add additional constraints. Although our 
focus was more on knowledge representation, we ran some experiments with re- 
spect to synthetic examples following the approach of [6,12] . In case of single-unit 
bids, our results have been comparable to those reported in [12]. In case of multi- 
unit bids with synthetic data drawn from a decay distribution [6] we obtained 
reasonable timings for bundle sizes up to 1500, with 150 items. Our timings were 
worse than [6] though. We did not compare with the timings in [4,13], as the 
first one is about an incomplete algorithm and the second one does not report 
timings. We hope the programs in this paper would serve as a benchmark and a 
challenge to researchers in logic programming, declarative problem solving and 
knowledge representation in terms of having faster implementations of Smodels. 
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Abstract. In this paper bounded model checking of asynchronous con- 
current systems is introduced as a promising application area for answer 
set programming. As the model of asynchronous systems a generalization 
of communicating automata, 1-safe Petri nets, are used. It is shown how 
a 1-safe Petri net and a requirement on the behavior of the net can be 
translated into a logic program such that the bounded model checking 
problem for the net can be solved by computing stable models of the 
corresponding program. The use of the stable model semantics leads to 
compact encodings of bounded reachability and deadlock detection tasks 
as well as the more general problem of bounded model checking of linear 
temporal logic. Some experimental results on solving deadlock detection 
problems using the translation and the Smodels system are presented. 



1 Introduction 

In this paper we put forward symbolic model checking [2,3] as a promising appli- 
cation area for answer set programming systems. In particular, we demonstrate 
how bounded model checking problems of asynchronous concurrent systems can 
be reduced to computing stable models of logic programs. 

Verification of asynchronous systems is typically done by enumerating the 
set of reachable states of the system. Tools based on this approach (with various 
enhancements) include, e.g., the Spin system [12], which supports extended state 
machines communicating through FIFO queues, and the PROD tool [17] based 
on Petri nets. The main problem with enumerative model checkers is the amount 
of memory needed to store the set of reachable states. 

Symbolic model checking is widely applied especially in hardware verifica- 
tion. The main analysis technique is based on (ordered) binary decision diagrams 
(HDDs) . In many cases the set of reachable states can be represented very com- 
pactly using a HDD encoding. Although the approach has been successful, there 

* This is an extended version of a paper titled “Answer Set Programming and Bounded 
Model Checking” [11] presented at the AAAI Spring 2001 Symposium on Answer Set 
Programming, Stanford, March 2001. The financial support of Academy of Finland 
(Projects 43963, 47754) and Tekniikan Edistamissaatio are gratefully acknowledged. 
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are difficulties in applying BDD-based techniques, in particular, in areas outside 
hardware verification. The key problem is that some Boolean functions do not 
have a compact representation as BDDs and the size of the BDD representation 
of a Boolean function is very sensitive to the variable ordering used. Bounded 
model checking [1] has been proposed as a technique for overcoming the space 
problem by replacing BDDs with satisfiability (SAT) checking techniques be- 
cause typical SAT checkers use only polynomial amount of memory. The idea is 
roughly the following. Given a sequential digital circuit, a (temporal) property 
to be verified, and a bound n, the behavior of a sequential circuit is unfolded up 
to n steps as a Boolean formula S and the negation of the property to be veri- 
fied is represented as a Boolean formula R. The translation to Boolean formulae 
is done so that S' A i? is satisfiable iff the system has a behavior violating the 
property of length at most n. Hence, bounded model checking provides directly 
interesting and practically relevant benchmarks for any answer set programming 
system capable of handling propositional satisfiability problems. 

Until now bounded model checking has been applied to synchronous hard- 
ware verification and little attention has been given to knowledge representation 
issues such as developing concise and efficient logical representation of system be- 
havior. In this work we study the knowledge representation problem and employ 
ideas used in reducing planning to stable model computation [15]. The aim is to 
develop techniques such that the behavior of an asynchronous concurrent system 
can be encoded compactly and the inherent concurrency in the system could be 
exploited in model checking the system. To illustrate the approach we use a 
simple basic Petri net model of asynchronous systems, 1-safe Place/Transition 
nets, which is an interesting generalization of communicating automata [5] . 

The structure of the rest of the paper is the following. In the next section 
we introduce Petri nets and the bounded model checking problem. Then we de- 
velop a compact encoding of bounded model checking as the problem of finding 
stable models of logic programs. We first show how to treat reachability prop- 
erties such as deadlocks and then demonstrate how to extend the approach to 
cope with properties expressed in linear temporal logic (LTL). We discuss initial 
experimental results and end with some concluding remarks. 



2 Petri Nets and Bounded Model Checking 

We will now introduce P/T-nets. They are one of the simplest forms of Petri 
nets. We will use as a running example the P/T-net presented in Fig. 1. 

A triple {P,T,F) is a net if P DT = 0 and PC (P x T) U (T x P). The 
elements of P are called places, and the elements of T transitions. Places and 
transitions are also called nodes. The places are represented in graphical notation 
by circles, transitions by squares, and the flow relation F with arcs. We identify F 
with its characteristic function on the set (P x T) U (T x P). The preset of a 
node X, denoted by *x, is the set {y G P U T\F{y,x) = 1}. In our running 
example, e.g., *t2 = {pl,p2}. The postset of a node x, denoted by x* , is the set 
{y £ P Li T \ F{x,y) = 1}. Again in our running example p2* = {t2, t3, t5}. 
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Fig. 1. Running Example 



A marking of a net {P, T, F) is a mapping P i— > IN. A marking M is identified 
with the multi-set which contains M (p) copies of p for every p G P. A 4-tuple E = 
(P, T, F, Mq) is a net system (also called a P/T-net) if (P, T, F) is a net and Mg 
is a marking of (P, T,F). A marking is graphically denoted by a distribution of 
tokens on the places of the net. In our running example in Fig. 1 the net has the 
initial marking Mg = {pl,p2}. 

A marking M enables a transition t G T ii Wp G P : F{p,t) < M{p). If t is 
enabled, it can occur leading to a new marking (denoted M M'), where M' is 
defined by Vp G P : M' (p) = M{p) — F {p, t) + F{t, p) . In the running example t2 

t2. 

is enabled in the initial marking Mg, and thus Mg ^ M', where M' = {p3,p4}. 

A marking M„ is reachable in E if there is an execution, i.e., a (possibly 
empty) sequence of transitions t\,t 2 ,---,tn and markings Mi, M 2 , . . . , M„_i 

such that: Mg Mi . . .M„_i ^ M„. A marking M is reachable within 
a bound n, if there is an execution with < n transitions, with which M is reach- 
able. 

A marking M is 1-safe if Vp G P : M(p) < 1. A P/T-net is 1-safe if all its 
reachable markings are 1-safe. We will restrict ourselves to finite P/T-nets which 
are 1-safe, and in which each transition has both nonempty pre- and postsets. 

Given a 1-safe P/T-net E, we say that a set of transitions S' C T is concur- 
rently enabled in the marking M, if (i) all transitions t G S are enabled in M, 
and (ii) for all pairs of transitions t,f G S, such that t ^ t' , it holds that 
n = 0. If a set S is concurrently enabled in the marking M, we can fire it 
in a step (denoted M M'), where M' is the marking reached after firing all 
of the transitions in the step S in arbitrary order. It is easy to prove by using 
the 1-safeness of the P /T-net E that all possible interleavings of transitions in a 
step S are enabled in M, and that they all lead to the same final marking M'. In 
our running example in the marking M' = {p3,p4} the step {tl,M} is enabled, 

and will lead back to the initial marking Mg. This is denoted by M' ^ A ^ Mg. 
Notice also that for any enabled transition, the singleton set containing only 
that transition is always (trivially) a step. 
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We say that a marking is reachable in step semantics in a 1-safe P/T-net if 
there is a step execution^ i.e., a (possibly empty) sequences iSi, S' 2 , . . . , of steps 

and Ml, M 2 , M„_i of markings such that: Mq ^ M\ ^ . . . M„_i ^ M„. 
A marking M is reachable within a bound n in the step semantics, if there is a 
step execution with at most n steps, with which M is reachable. 

We will refer to the “normal semantics” as interleaving semantics. Note that 
if a marking is reachable in n transitions in the interleaving semantics, it is 
also reachable in n steps in the step semantics. However, the converse does not 
necessarily hold. We have, however, the following theorem. 

Theorem 1. For finite 1-safe P/T-nets the set of reachable markings in the 
interleaving and step semantics coincide. 

Linear temporal logic (LTL). The linear temporal logic LTL is one of the most 
widely used logic for specifying properties of reactive systems [3] . The basic idea 
is to specify properties that the system should have using LTL. A model checker 
is then used to check whether all (infinite) behaviors of the system are models 
of the specification formula. If not, then the model checker outputs a behavior 
of the system which violates the given specification. 

Given a finite set AP of atomic propositions, the syntax of LTL^ is given by: 

(/?::= p e AP I \ (pi y (p>2 \ ^ \ TiU <P2 \ ■ 

An w-word over 2^^ is an infinite sequence w = xqXx ... such that Xi € 2^^ 
for all i > 0. For an w-word w we define = Xi, and denote by the suffix 
of w starting at Xi. We define the relation w \= (p inductively as follows: 

~ w \= p iS p G W(o) for p G AP 

— ru 1= ->(pi iff not w \= 

— w \= ipi y p 2 AL w \= or w \= ip 2 

— w \= (fi /\ p 2 AL w \= and w \= P 2 

— u> 1= Pi C/ p 2 iff there exists a j > 0 such that |= p 2 and for all 0 < i < j, 

1= ipi 

— w \= ifi R ip 2 iS for all j > 0, if for every i < j ^ pi then |= p 2 . 

We define some shorthand LTL formulas: T = p V for some arbitrary fixed 
p G AP, ± = ^T, O p = (T G p), □ p = (A i?p), and pi — > p 2 = ->pi V p 2 . 

The temporal operators are called: U for “until”, R for “release”, O for 
“eventually” , and □ for “globally” . Some examples of practical use of LTL for- 
mulas in specification are: □^(csi A CS 2 ) (it always holds that two processes 
are not at the same time in a critical section), □(reg ^ Oack) (it is always 
the case that a request is eventually followed by an acknowledgement), and 
((□Osc/ii) A (□Osc/ 12 )) ^ (Q(tri ^ Ocsi)) (if both process 1 and 2 are sched- 
uled infinitely often, then always the entering of process 1 in the trying section 
is followed by the process 1 eventually entering the critical section). 

^ Note that we do not define the often used next-time operator X p. This is a tradeoff 
which allows the use of step semantics. 
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Given a 1-safe P/T net S, we use a chosen subset of the places as the atomic 
propositions AP. An infinite (interleaving) execution Mq ^ M\ . satisfies 
iff the corresponding w-word w = {Mq n AP), (Mi n AP), . . . satisfies (p. We say 
that E satisfies ip iff every infinite execution starting from the initial marking Mq 
satisfies ip. Alternatively, E does not satisfy ip if there exists an infinite execution 
starting from Mq which satisfies -up. We call such an execution a counterexample. 

The temporal logic LTL specifies properties of infinite executions. In many 
cases it suffices to reason about simple temporal properties. A typical example 
is the reachability of a marking satisfying some condition C which roughly cor- 
responds to finding a counterexample for a formula D^C. An important reach- 
ability based property is deadlock detection. 

Definition 1. (Deadlock) Given a 1-safe P/T-net E, is there a reachable 
marking M which does not enable any transition of E? 

Most analysis questions including deadlock detection and LTL model check- 
ing are PSPACE-complete in the size of a 1-safe Petri net, see e.g., [6]. In bounded 
model checking we fix a bound n and look for counterexamples which are shorter 
than the given bound n. For example, in the case of bounded deadlock detection 
in step semantics we look for step executions reaching a deadlock in n steps. 
It is easy to show that, e.g., the bounded deadlock detection problem in step 
semantics is NP-complete (when the bound n is given in unary coding). 

This idea can also be applied to LTL model checking. Biere et.al. [1] introduce 
bounded LTL model checking. They also discuss how to ensure that a given 
bound n is sufficient to guarantee completeness. Unfortunately, getting an exact 
bound is often computationally infeasible, and easily obtainable upper bounds 
are too large. In the case of 1-safe P/T-nets they are exponential in the number 
of places in the net. Therefore the bounded model checking results are usually 
not conclusive if a counterexample is not found. Thus bounded model checking 
is at its best in “bug hunting” , and not as easily applicable in verifying systems 
to be correct. 

3 From Bounded Model Checking to Answer Set 
Programming 

In this section we show how to solve bounded LTL model checking problems using 
answer set programming. We start with the simpler reachability properties and 
then extend the approach to handle full LTL model checking. 

For encoding bounded model checking problems we use normal logic pro- 
grams with the stable model semantics [8] . A normal rule is of the form 

a^bi,.. . ,6m, not ci, . . . ,not c„ (1) 

where each a, bi, Cj is a ground atom. We employ three extensions which can be 
seen as compact shorthands for normal rules. We use integrity constraints, i.e.. 
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rules with empty head. Such a constraint like the one on the left can be taken 
as a shorthand for a rule given on the right 

<— &, not c ^ / <— not /, b, not c 

where / is a new atom. For expressing the choice whether to include an atom in 
a stable model we use choice rules. They are normal rules where the head is in 
brackets with the idea that the head can be included in a stable model only if 
the body holds but it can be left out, too. Such a construct can be represented 
using normal rules by introducing a new atom. For example, the choice rule on 
the left corresponds to the two normal rules on the right where a' is a new atom. 

{a} ^ b, not c ^ a ^ not o', 6, not c 

a' <— not a 

Finally, a compact encoding of conflicts is needed, i.e., rules of the form 

^ 2{ai,...,a„} (2) 

saying that a stable model cannot contain any two atoms out of a set of atoms 
{ai, . . . , On}. Such a rule can be expressed, e.g., by adding a rule / ^ not /, a^, Oj 
for each pair ai,aj from {ai,...,a„}, i.e., using Ofn?) rules. Choice and con- 
flict rules are simple cases of cardinality constraint rules [16]. The Smodels sys- 
tem (http://www.tcs.hut.fi/Software/smodels/) provides an implementa- 
tion for cardinality constraint rules and includes primitives supporting directly 
such constraints without translating them first to corresponding normal rules. 

3.1 Reachability Checking 

Now we devise a method for translating bounded reachability problems of 1-safe 
P/T-nets to tasks of finding stable models. Consider a net N = (P,T,F) and a 
step bound n > 1. We construct a logic program 7 Ta( 1V, n), which captures the 
possible executions of up to n steps, as follows. 

~ For each place p € P, include a choice rule {p(0)} <— . 

— For each transition t G T, and for all z = 0, 1, . . . , n — 1, include a rule 

{t{i)}^pi{i),...,pi{i) (3) 

where {p\, . . . ,pi} is the preset of t. Hence, a stable model can contain a 
transition instance in step i only if its preset holds at step i. 

— For each place p € P, for each transition tk in the preset of p, and for all 
z = 0, 1, . . . , n — 1, include a rule 

p{i + 1) ^ tk{i) . (4) 

These say that p holds in the next step if at least one of its preset transitions 
is in the current step. 
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1 pl{i + 1) ^ tl{i) 


pl{i + 1) <— pl(i), not t2(i) 


{pl(0)} ^ 


{t2{i)} 


\,p2{i) p2{i + 1) ^ t4{i) 


p2{i + 1) <— p2(i), not t2{i), 


{p2(0)} ^ 


^ p2{{] 


1 p3{i + 1) <— t2{i) 


o 

o 


{p3(0)} ^ 


{tffi)} ^ p4{i] 


1 p4(i + 1) <— t2{i) 


p3{i + 1) ^ p3(i), not tl{i) 


{p4(0)| ^ 


{t5(i)} ^ p2{{] 


\ pffi + 1) <— t3(i) 


p4(i + 1) ^ p4(i), not t4{i) 


{p5(0)} ^ 




p5(i + 1) <— t5(i) 


p5{i + 1) ^ p5(j) 





<— 2{t2{i), tS{i),t5{i)} where i = 0, 1, . . . n — 1 
Fig. 2. Program ]Ja{N, n) 



— For each place p G P, and for alH = 0, 1, . . . , n — 1, include a rule 

(5) 

where {fi, . . . , f/} is the set of transitions having each p in their preset and 
I > 2. This rule states that at most one of the transitions that are in conflict 
w.r.t. p can occur. 

— For each place p, and for alH = 0, 1, . . . , n — 1, 

p{i + 1) ^ p{i), not ti{i ), . . . ,not ti{i) (6) 

where {ti, . . . ,ti} is the set of transitions having p in their preset. This is 
the frame axiom for p stating that p holds if no transition using it occurs. 

Consider net N in Fig. 1 for which program IlA{N,n) is given in Fig. 2. In 
IIa{N, n) the initial marking is not constrained but any Boolean combination C 
of marking conditions can be captured with a set of rules 7 Tm(C', i) [16]. For 
example, to eliminate stable models not satisfying a condition C at step i saying 
that M{pi) = 1 and (M{p 2 ) = 0 or M{pz) = 1), it is sufficient to use rules 
^m(C, i): 

<— not c{i) Cp^yp 3 {i) ^ not p 2 {i) 

c(*) ^ Pi{i),Cp^yp3{i) Cp-^vpsii) ^ Pz{i) 

Our approach can solve a reachability problem for a set of initial markings 
given by a condition Cq where the markings to be reached are specified by 
another condition C . 

Theorem 2. Let N = (P,T,F) be a 1-safe P/T-net for all initial markings 
satisfying a condition Cq. Net N has an initial marking satisfying Cq such that 
a marking satisfying a condition C is reachable in at most n steps iff LlyiiCo, 0)U 
7Ja(A^, n) U 77 m (O', n) has a stable model. 

The deadlock detection problem is now just a special case of a reachability 
property, just add rules TTm (C, n) = TTd (iV,n) eliminating stable models where 
some transition is enabled. Program Ud{N, n) includes for each transition t G T 
and its preset {pi , ,pi}, a rule 

^ pi{n), . . . ,pi{n) . 

For our running example, the rules IIr){N,n) are 



( 7 ) 
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3.2 Bounded LTL Model Checking 

Our strategy for finding counterexamples for LTL formula Lp (i.e., executions 
satisfying -k^) is exactly the same as in [1]. There it is shown to be an approx- 
imation of the unbounded version which becomes equivalent to the unbounded 
case if the bound used is sufficiently increased. We (as they do) require that all 
reachable states of the system have a successor (i.e., there are no deadlocks). In 
this case the reachability of a marking satisfying a condition C is equivalent to 
finding a counterexample for an LTL formula of the form D^C. 

We look for two different kinds of counterexamples. On the left in Fig. 3 is 
a loop counterexample, and on the right is a counterexample without loop. Loop 
counterexamples specify an infinite execution themselves, while counterexamples 
without a loop specify a prefix of an execution, which can be always extended 
to an infinite execution (by the deadlock freeness assumption). The arcs of the 
figure denote the “next state” of each state. Notice in the loop counterexample 
that if is equivalent to the last state M„, the state Mi is the “next state” 

of Mn- Our semantics is cautious in the case without loop, and extending the 
execution into an infinite one in any way will yield a counterexample.^ 



= M„ 

Mq 1 ) . . . 

/ il{i) il{i + 1) 




Fig. 3. Two counterexample possibilities 



An LTL formula is said to be in positive normal form when all negations 
in the formula appear directly before an atomic proposition. A formula can be 
put into positive normal form with the following equivalences (and their duals): 
= p, -•{pi V P 2 ) = -<pi A ^P 2 , and ~^{pi U P 2 ) = ~^pi R^P 2 - 
Given an LTL formula / in positive normal form (when the formula to be 
model checked is p, the formula / is equivalent to ~^p with negations pushed in), 
and a bound n > 1 we construct a program 77ltl(/, ?^) as follows. 

~ Guess which state is equivalent to the last. For all 0 < i < n — 1 add rule 

{el{i)}^ . (8) 

— Disallow guessing two or more. (Guessing none is allowed though.) Add rule 

<— 2{e/(0), e^(l), . . . , e^(n — 1)} . (9) 



^ Actually the counterexamples without loop are exactly the informative safety coun- 
terexamples of [13]. 
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Formula type 


Translation 


Formula type 


Translation 


p, for p G AP 


/(z) ^p{i) 


-ip, for p G AP 


f{i) ^ not p{i) 


fl V /2 


f{i) ^ fi {i) 

/(*) ^ Hi) 


fl ^ /2 


f{i) ^ Hi)j2{i) 


flUf2 


fit) ^ Hi) 
f{i) ^ Hi)J{i + l) 

/(n + 1) ^nl{i),f(i) 


/ 1 R /2 


f{i) ^ h{if,fi{i) 
f{i) ^ Hi)J{i + l) 

/(n + 1) ^nl{i),f(i) 
f(n + 1) <— 1, not cstate(f) 
cstate{f) <— z/(z),not / 2 (z) 



Fig. 4. Translation of an LTL formula / 



— Check that the guess is correct. For all 0 < i < n — 1, p G P include rules 

<— eZ(*),p(i), not p(n) <— el(j),p(n), not p(i) . 

~ Specify auxiliary loop related atoms. For all 0 < * < n — 1, include rules 

I <— el{i) nl(i + 1) <— el{i) il{i + 1) <— el{i) il{i + 1) <— il{i) . 

See Fig. 3 for an example. The nl{i) atom is in a model for the “next state” 
of the last state, while il{i) is in the model for all states in the loop. 

— Require that if a loop exists, the last step contains a transition to disallow 
looping by idling. Add the rule 

<— I, not ti{n — 1), . . . ,not tk{n — 1) (10) 

where {ti, . . . , tk} = T, i.e., the set of all transitions. 

~ Allow at most one visible transition in a step to eliminate steps which cannot 
be interleaved to yield a counterexample. For all 0 < z < n — 1, add rule 

^2{ti(*),...,tfe(z)} (11) 

where {ti, . . . ,tfe} is the set of visible transitions, i.e., the transitions whose 
firing changes the marking of a place p appearing in the formula /. 

We recursively translate the formula / by first translating its subformulae, and 
then / as follows. For all 0 < z < n, add the rules given by Fig. 4.^ Finally we 
require that the top level formula / should hold in the initial marking 

^ not /(O) . (12) 

With this program Hltl(/, ?^) we get our main main result. 

Theorem 3. Let f be an LTL formula in positive normal form and N= {P, T, F) 
be a 1-safe and deadloek free P/T-net for all initial markings satisfying a con- 
dition Cq. //7Tm(C'o,0) U TTa(W^) U nuTh{f,n) has a stable model, then there 
is an execution of N from an initial marking satisfying Cq which satisfies f. 

® An equivalence explaining the release translation: fi Rf^ = (D/ 2 ) V (/2 U (/2 A /i)). 
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The size of the program in Theorem 3 is linear in the size of the net and 
formula, i.e., 0{{\P\ + |T| + \F\ + \f\) ■ n). The semantics of LTL is defined over 
interleaving executions. A novelty of the translation is that it allows concurrency 
between invisible transitions. 

Forcing interleaving semantics. We can create the interleaving semantics ver- 
sions of bounded model checking problems by adding a set of rules Ui{N, n). It 
includes for each time step 0<i<n— la rule 

^2{h{i),...,t^{i)} (13) 

where {ti, . . . ,tm} is the set of all transitions. These rules eliminate all stable 
models having more than one transition firing in a step. 

Corollary 1. Let IIs(N, n) he a program solving a bounded model checking prob- 
lem in the step semantics using any of the translations above. Then the program 
IIs{N,n) U IIi(N,n) solves the same problem in the interleaving semantics. 

3.3 Relation to Previous Work 

In previous work on bounded model checking little attention has been given 
to the knowledge representation problem of encoding succinctly the unfolded 
behavior and the temporal property. We address this problem and develop an 
encoding of the behavior of an asynchronous system which is linear in the size of 
the system description (Petri net) and in the number of steps. Moreover, it allows 
the exploitation of the inherent concurrency of the system in model checking. 

Our approach could be used as a basis for a similar treatment using propo- 
sitional logic and satisfiability (SAT) checkers. For simple temporal properties 
such as reachability and deadlock this is fairly straightforward to develop us- 
ing the ideas of Clark’s completion and Fages’ theorem [7]. This is because our 
encoding produces acyclic programs except for the choice rules which need a 
special treatment. To achieve a compact SAT encoding is more challenging be- 
cause propositional logic lacks cardinality constraint rules (2). Their mapping 
to propositional formulae can result to a quadratic blow-up which is sometimes 
significant as conflicts may involve even hundreds of transitions. 

For general LTL model checking a succinct SAT encoding is challenging. The 
compactness of our encoding is due to the fact that stable model semantics sup- 
ports the smallest fixed point evaluation of recursive rules which is exploited in 
translating the U and R operators. Because of these recursive rules a similar com- 
pact SAT encoding is not immediate. In [1] a SAT encoding is given. However, 
it is more complicated than our linear size encoding but remains polynomial. 

4 Experiments 

We have implemented the deadlock detection and LTL model checking transla- 
tions presented in the previous section. The translation is given a fixed initial 
marking Mq, which allows the following optimizations to be implemented: 
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Problem 


Rl 


|T| 


St. n 


St. s 


Int. n 


Int. s 


States 


DARTES(l) 


331 


257 


32 


0.5 


32 


0.5 


>1500000 


DP(6) 


36 


24 


1 


0.0 


6 


0.1 


728 


DP(8) 


48 


32 


1 


0.0 


8 


0.3 


6560 


DP(IO) 


60 


40 


1 


0.0 


10 


3.3 


59048 


DP(12) 


72 


48 


1 


0.0 


12 


617.4 


531440 


ELEV(l) 


63 


99 


4 


0.0 


9 


0.4 


163 


ELEV(2) 


146 


299 


6 


0.5 


12 


3.9 


1092 


ELEV(3) 


327 


783 


8 


5.6 


15 


139.0 


7276 


ELEV(4) 


736 


1939 


10 


157.2 


>13 


1215.2 


48217 


HART(25) 


127 


77 


1 


0.0 


>5 


1.0 


>1000000 


HART(50) 


252 


152 


1 


0.0 


>5 


5.7 


>1000000 


HART(75) 


377 


227 


1 


0.0 


>5 


15.5 


>1000000 


HART(IOO) 


502 


302 


1 


0.0 


>5 


35.9 


>1000000 


KEY(2) 


94 


92 


>25 


1937.9 


>26 


56.1 


536 


MMGT{3) 


122 


172 


7 


11.1 


10 


87.2 


7702 


MMGT(4) 


158 


232 


8 


687.3 


>11 


1874.1 


66308 




163 


194 


9 


0.1 


>17 


2733.7 


123596 



Fig. 5. Experiments 



— Place and transition atoms are added only from the time step they can first 
appear on. Only atoms for places p(0) in the initial marking are created 
for time i = 0. Then for each 0 < i < n — 1: (i) Add transition atoms for 
all transitions t{i) such that all the place atoms in the preset of t{i) exist, 
(ii) Add place atoms for all places p{i + 1) such that either the place atom 
p{i) exists or some transition atom in the preset of p(i + 1) exists. 

~ Duplicate rules are removed. Duplicates can appear in (5), (7). 

As benchmarks we use a set of deadlock detection benchmarks collected by 
Corbett [4], converted to 1-safe P/T-nets by Melzer and Romer [14]. The models 
were picked from those which have a deadlock. For each model and both seman- 
tics we incremented the used bound until a deadlock was found. We report the 
time for Smodels to find the first stable model using this bound. In some cases 
a model could not be found within a reasonable time in which case we report 
the time used to prove that there is no deadlock within the reported bound. Un- 
fortunately, we did not have a large collection of LTL model checking examples, 
and benchmarking the LTL translation is left for further work. The experimental 
results can be found in Fig. 5. The columns are: 

— Problem: The problem name with the size of the instance in parenthesis. 

— jPj: Number of places in the original net. 

— |T|: Number of transitions in the original net. 

~ St. n: The smallest integer n such that a deadlock could be found using the 
step semantics / in case of > n the largest integer n for which we could prove 
that there is no deadlock within that bound using the step semantics. 

— St. s: The time in seconds to find the first stable model / to prove that there 
is no stable model. (See St. n above.) 

— Int. n and Int. s: defined as St. n and St. s but for the interleaving semantics. 

— States: Number of reachable states of the P/T-net (if known) 

These differ from the ones reported in [11] where unfortunately there are some errors. 
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The times reported are the average of 5 runs of the time for smodels 2.26 
as reported by the /usr/bin/time command on a 450Mhz Pentium III PC 
running Linux. The used tools, nets, and logic programs are available from: 
<http : //www. tcs .hut . f i/~kepa/experiments/LPNMR2001/>. 

In many of the experiments the step semantics version found a deadlock with 
a smaller bound than the interleaving one. Also, when the bound needed to find 
the deadlock was fairly small, the bounded model checker was performing well. 
In the examples ELEV(4), HART(x) and Q(l) we were able to find the coun- 
terexample only when using step semantics. In the KEY(2) example we were 
not able to find a counterexample with either semantics, even though the prob- 
lem is known to have only a small number of reachable states. In contrast, the 
DARTES(I) problem has a large state-space, and despite of it a counterexample 
of length 32 was obtained. Overall, the results are promising, in particular, for 
small bounds and the step semantics. 

5 Conclusions 

We introduce bounded model checking of asynchronous concurrent systems mod- 
eled by 1-safe P/T-nets as an interesting application area for answer set program- 
ming. We present mappings from bounded reachability, deadlock detection and 
LTL model checking problems of 1-safe P/T-nets to stable model computation. 
Our approach is capable of doing model checking for a set of initial markings at 
once. This is usually difficult to achieve in current enumerative model checkers 
and often leads to state space explosion. We handle asynchronous systems using 
a step semantics whereas previous work on bounded model checking only uses 
the interleaving semantics [1]. Furthermore, our encoding is more compact than 
the previous approach employing propositional satisfiability [1]. This is because 
our rule based approach allows to represent executions of the system, e.g. frame 
axioms, succinctly and supports directly the recursive fixed point computation 
needed to evaluate LTL formulae. 

The first experimental results indicate that stable model computation is quite 
a competitive approach to searching for short executions of the system leading 
to deadlock and worth further study. More experimental work and comparisons 
are needed to determine the strength of the approach. In particular, for compar- 
ing with SAT checking techniques, it would be interesting to develop a similar 
treatment of asynchronous systems using a SAT encoding and compare it to the 
logic program based approach. 

Relating the net unfolding method (see [9,14] and further references there) 
to bounded model checking would be interesting. There are also alternative se- 
mantics to the two presented in this work [10], applying them to bounded LTL 
model checking is left for further work. 
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Abstract. In this paper we suggest an architecture for a software agent 
which operates a physical device and is capable of making observations 
and of testing and repairing the device components. We present novel 
definitions of the notions of symptom, candidate diagnosis, and diagnosis 
which are based on the theory of action language AC. The new definitions 
allow one to give a simple account of the agent’s behavior in which many 
of the agent’s tasks are reduced to computing stable models of logic 
programs. 



1 Introduction 

In this paper we continue the investigation of applicability of A-Prolog (a loosely 
defined collection of logic programming languages under the answer set seman- 
tics [6]) to knowledge representation and reasoning. The focus is on the develop- 
ment of an architecture for a software agent acting in a changing environment. 
We assume that the agent and the environment (sometimes referred to as a 
dynamic system) satisfies the following simplifying conditions. 

1. The agent’s environment is viewed as a transition diagram whose states are 
sets of fluents (relevant properties of the domain whose truth values may 
depend on time) and whose arcs are labeled by actions. 

2. The agent is capable of making correct observations, performing actions, and 
remembering the domain history. 

These assumptions hold in many realistic domains and are suitable for a broad 
class of applications. In many domains, however, the effects of actions and the 
truth values of observations can only be known with a substantial degree of 
uncertainty which cannot be ignored in the modeling process. It remains to be 
seen if some of our methods can be made to work in such situations. The above 
assumptions determine the structure of the agent’s knowledge base. It consists 
of three parts. The first part, called an action (or system) description, specifies 
the transition diagram representing possible trajectories of the system. It con- 
tains descriptions of domain’s actions and fluents, together with the definition 



T. Eiter, W. Faber, and M. Truszczynski (Eds.): LPNMR 2001, LNAI 2173, pp. 213—225, 2001. 
(c) Springer-Verlag Berlin Heidelberg 2001 



214 



Michael Gelfond et al. 




Fig. 1. AC 



of possible successor states to which the system can move after an action a is 
executed in a state cr. The second part of the agent’s knowledge, called history 
description, contains observations made by the agent together with a record of 
its own actions. It defines a collection of paths in the diagram which can be inter- 
preted as the system’s possible pasts. If the agent’s knowledge is complete (i.e., it 
has complete information about the initial state and the occurrences of actions) 
and the system’s actions are deterministic then there is only one such path. The 
third part of agent’s knowledge base contains a collection of the agent’s goals. 
All this knowledge is used and updated by the agent who repeatedly executes 
the following steps: 

1. observe the world and interpret the observations; 

2. select a goal; 

3. plan; 

4. execute part of the plan. 

In this paper we concentrate on agents operating physical devices and capable 
of testing and repairing the device components. We are especially interested in 
the first step of the loop, i.e. in agent’s interpretations of discrepancies between 
agent’s predictions and the system’s actual behavior. The following example will 
be used throughout the paper: 

Example 1. Consider a system S consisting of an analog circuit AC from figure 
1. We assume that switches si and S 2 are mechanical components which cannot 
become damaged. Relay r is a magnetic coil. If not damaged, it is activated 
when Si is closed, causing S 2 to close. Undamaged bulb b emits light if S 2 is 
closed. For simplicity we consider an agent capable of performing only one ac- 
tion, dose(si). The environment can be represented by two damaging exogenous 
actions: brk, which causes b to become faulty, and srg, which damages r and 
also b assuming that b is not protected. Suppose that the agent operating this 
device is given a goal of lighting the bulb. He realizes that this can be achieved 
by closing the first switch, performs the operation, and discovers that the bulb 
is not lit. The goal of the paper is to specify the agent’s behavior after this 
discovery. 
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We start with presenting our definitions of the notions of symptom, candidate 
diagnosis, and diagnosis which are based on the theory of action language AC [1] . 
These definitions are used to give a simple account of the agent’s behavior in 
which many of the agent’s tasks are reduced to computing stable models of logic 
programs. 



Background 

By a physical system S we mean a triple (C, F, A) of finite sets. Elements of C are 
called components of S. Elements of F are referred to as fluents. By fluent literals 
we mean fluents and their negations (denoted by ~^f). The set A of elementary 
actions is partitioned into two disjoint sets, Ag and Ag] Ag consists of actions 
performed by an agent and Ag consists of exogenous actions whose occurrence 
can cause system components to malfunction. 

A system S will be associated with the transition diagram T{S) (or simply T). 
States of T are labeled by complete and consistent sets of fluent literals corre- 
sponding to possible physical states of S. The arcs are labeled by subsets of A 
called compound actions. Execution of a compound action {ai, . . . ,ak} corre- 
sponds to the simultaneous execution of its components. Paths of T correspond 
to possible behaviors (or trajectories) of S. To reason about S we need to have a 
concise and convenient way to define its transition diagram. This will be done by 
a system description SD{S) (or simply SD) consisting of rules of A-Prolog defin- 
ing components of S, its fluent and actions, causal laws determining the effects of 
these actions, and the actions’ executability conditions. We assume that SD has 
a unique answer set which defines an action description of AC. (In our further 
discussion we will identify this action description with SD.) Causal laws of SD 
can be divided into two parts. The first part, SDn, contains laws describing nor- 
mal behavior of the system. Their bodies usually contain special fluent literals 
of the form ^ab{c). As usual ab{c) is read as “component c of S' is abnormal” . Its 
use in diagnosis goes back to [15]. The second part, SDb, describes effects of ex- 
ogenous actions damaging the components. Such laws normally contain relation 
ab in the head or positive parts of the bodies. 

In addition to describing all possible trajectories of S, we need to describe the 
history of S up to a current moment n. This is done by a collection of 
statements in the ‘history description’ part of AC. We assume that the system’s 
time is discrete and ti and ti+\ stand for two consecutive moments of time in 
the interval 0 . . . n. Statements of Fn have the form: 

1. obs{l,t) - ‘fluent literal I was observed to be true at moment t’; 

2. hpd{a,t) - elementary action a £ A was observed to happen at moment t 

where 0 < t < n. For simplicity we only consider histories with observations 
closed under the static causal rules of AC, (i.e. if every state of S must satisfy a 
constraint ‘fluent literal Iq is true if fluent literals from P are true’ and literals 
from P are observed in F then so must be Iq). Let S' be a system with the 
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transition diagram T and let Fn be a history of S up to moment n. A path 
(To, Co, (Ti, . . . , a„_i, (T„ in T is a model of Fn iff 

1. Ofc = {a : hpd{a, k) e r„}; 

2. if o6s(/, k) € Fn then I G Ck- 



Fn is consistent (with respect to T) if it has a model. A fluent literal I holds 
in a model M at time k < n (M |= h{l,k)) if I G ak- Finally, Fn |= h{l,k) 
if, for every model M of Fn, M |= h{l,k). Notice that, in contrast to action 
description language C from [2], [3] a domain description of AC is consistent 
only if changes in the observations of system’s states can be explained without 
assuming occurrences of any action not recorded in Fn- 

The following is a description, SD, oi system S from Example 1: 

Fluents: 

comp{r). comp{b). switch{si). switch{s 2 ) 

f (active (r)) . f(on(b)). f(prot(b)). 

f{closed{SW)) ^ switch{SW). 

f{ab{X)) <— complex). 



Agent Actions: Exogenous Actions 

ajact{close{si)). X-act{brk). 

X-act(srg). 



Causal Laws and Executability Conditions describing normal functioning of S\ 

causes{close{si) , closed{si), []). 
caused{active{r) , [c^osed(si), ^a6(r)]). 

„„ caused{closed{s 2 ),[active{r)]). 

causedL{on{b) , [closed{s 2 ),^ab{b)]). 
caused{-^on{b) , [^c^osed(s 2 )]). 

_ impossibleJ f {close{si) , [cZose(f(si)]). 



{causes{A, L, P) says that execution of action A in a state satisfying fluent liter- 
als from P causes fluent literal L to become true in a resulting state; caused{L, P) 
means that every state satisfying P must also satisfy L, impossibleJ, f {A, P) in- 
dicates that action A is not executable in states satisfying P.) The system’s 
malfunctioning information is given by: 



SDb 



causes{brk,ab(b), []). 
causes{srg, ab{r), []). 
causes{srg, ab{b), [-^prot{b)]) . 
caused{-^on{b) , [a6(6)]). 
caused{-^active{r) , [a6(r)]). 



Now consider a history, Tq of S': 
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ro 



hpd{dose{si) ,0) . 
obs{^dosed(si) , 0) . 
obs{-^dosed{s 2 ) ■ 
obs{^ab{b),0). 
obs{^ab(r) , 0). 

^ obs{prot{b),0). 



It is easy to see that the path (cto, cZose(si), cti) is the only model of Iq and that 
Fq 1= h{on(b), 1) 



2 Basic Definitions 

Let 5 be a system with the transition diagram T, n be a moment of time, 
be a collection of observations made by the agent starting at n, and Fn-i be the 
previous history of S. We say that a pair 

5 = (r„_i,o„) (1) 

is a symptom of the system’s malfunctioning if Fn-i is consistent (w.r.t. T) and 
Fn-iUOn is not. Our definition of a candidate diagnosis of symptom (1) is based 
on the notion of explanation from [1]. In our terminology, an explanation, E, of 
symptom (1) is a collection of statements 

E = {hpd{ai, t) : 0 <t < n and G Ag} (2) 

such that En-i U U if is consistent. 

Definition 1. A candidate diagnosis D of symptom (1) consists of an explana- 
tion E{D) of (1) together with the set A{D) of components of S which could 
possibly be damaged by actions from E{D). More precisely, A{D) = {c : M \= 
h{ab{c),n — 1)} for some model M of Fn_i U U E{D). 

Definition 2. We say that a diagnosis of a symptom S = (ifn-ij On) is a can- 
didate diagnosis in which all components in A are faulty. 



3 Computing Candidate Diagnoses 

In this section we show how the need for diagnosis can be determined and can- 
didate diagnoses found by the techniques of answer set programming [10]. 

Consider a system description SD of S whose behavior up to the moment n — 1 
from some interval [0,A^j is described by history Ci_i. (We assume that N is 
sufficiently large for our application.) We start by describing an encoding of 
SD into programs of A-Prolog suitable for execution by SMODELS [14]. Since 
SMODELS takes as an input programs with finite Herbrand bases, references to 
lists should be eliminated from SD. To do that we expand the signature of SD 
by new terms - names of the corresponding causal laws - and consider a mapping 
a defined as follows: 
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1. a{causes{a,lo,[li . . Am])) is the collection of atoms dJaw{d), head{d,lo), 
action{d,a), prec{d,iji) for 1 < i < m, and prec{d,m + l,nil) (Here and 
below d will refer to the name of the corresponding law). 

2. a{caused{lo, [h . . . Im])) is the collection of atoms sJaw{d), head{d, Iq), 
prec{d, i, k) for 1 < i < m, and prec{d, m + 1, nil). 

3. a{impossibleA f {a, [/i . . . ^m])) is a constraint 

^ h{li,T), . . . , h{ln, T), 
o{a,T). 

where o{a,t) stands for action a occurred at time t. 

By a{SD) we denote the result of applying a to the laws of SD. Finally, for any 
history, F, of S 

a{SD,F) = nUa{SD)U r 
where II is defined as follows: 



n 



1. h{L,T') 



2. h{L,T) 



3. alLh{D,N,T) 

4. allJi\D,N,T) 



5. precJi{D, T) 

6. h{L,T) 

7. o{A,T) 

8. h(L,0) 

9. 



■ dAaw{D), 
head{D, L), 
action{D, A), 
o{A,T), 
prec-h{D, T). 
sAaw{D), 
head{D, L), 
prec-h(D, T). 
prec{D, N, nil). 
prec{D, TV, P), 
h{P,T), 

alLh{D,N',T). 
alLh{DA,T). 
HL,T), 
not h{L, T'). 
hpd(A, T). 
obs{L, 0). 
obs{L, T), 
not h(L, T). 



Here D, H, L are variables for the names of laws, actions, and fluent literals 
respectively, T,T' denote consecutive time points from the interval [0, A^], and 
TV, N' are variables for consecutive integers. (The corresponding typing predi- 
cates in the bodies of some rules of II are omitted to save space; o is used instead 
of hpd to distinguish between actions observed and actions hypothesized). The 
following terminology will be useful for describing the relationship between an- 
swer sets of a{SD,Fn-i) and models of Pn-i. 

We say that an answer set of a{SD, Fn-i) defines the trajectory 
p = (To, oo, CTi, . . . , a„_ 2 , CTn-i where at = {I ■ h{l,k) S ^5} and Uk = {a : 
o(a, k) € ^5}. 
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The following theorem establishes the relationship between the theory of actions 
in AC and logic programming. 

Theorem 1. If the initial situation of Fn-i is complete, i.e. for any fluent f 
of SD, Fn-i contains obs{f,0) or obs{^f,0) then M is a model of Fn~i iff M 
is a trajectory defined by some answer set of a{S D ^ Fn-i) . 

(The theorem is similar to the result from [18] which deals with a different 
language and uses the definitions from [11]). 

Now let 5 be a symptom of the form (1), and let 

TEST{S) = a{SD, r„_i) U U i? (3) 



where 

„ ( obs(f,0) ^ not obsi-^f,Q). 

for any fluent f € F. The rules of R are sometimes called the awareness axioms. 
They guarantee that initially the agent considers all possible values of the do- 
main fluents. (If the agent’s information about the initial state of the system is 
complete these axioms can be omitted.) The following corollary forms the basis 
for our diagnostic algorithms. 

Corollary 1. Let S = {Fn-\,On) where F^-i is consistent. Then S is a symp- 
tom of system ’s malfunctioning iff the program 
TEST(S) has no answer set. 

To diagnose the system, S, we construct a program, DM, defining an expla- 
nation space of our diagnostic agent - a collection of sequences of exogenous 
events which could happen (unobserved) in the system’s past and serve as pos- 
sible explanations of unexpected observations. We call such programs diagnostic 
modules for S. The simplest diagnostic module, DMq, is defined by rules: 



(o{A,T) 



DMq < 



0 < T < n, X-act{A), 
not ~^o{A, T). 



~^o(A, T) ^ 0 < T < n, X-act(A), 
not o{A, T). 



or, in the more compact, choice rule, notation of SMODELS ([16]) 
{o{A, T) : x.act{A)} ^ 0 < T < n. 



(Recall that a choice rule has the form 

m{p{X) : q{X)'\n <— body 

and says that, if the body is satisfied by an answer set AS of a program then 
AS must contain between m and n atoms of the form p{t) such that q{t) € AS".) 
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Finding candidate diagnoses of symptom (1) can be reduced to finding answer 
sets of a diagnostic program 



It is not difficult to see that DMq generates every possible sequence of the past 
occurrences of exogenous actions and hence, by Theorem 1, T>o{S) finds all the 
candidate diagnoses of S. 

Example 2. Let us again consider system S from Example 1. According to Iq 
initially the switches si and S 2 are open, all circuit components are ok, si is 
closed by the agent, and b is protected. It is predicted that b will be on at 
1. Suppose that, instead, the agent observes that at time 1 bulb b is off, i.e. 
Oi = {obs{^on{b),l)}. Intuitively, this is viewed as a symptom Sq = (Io,Oi) 
of malfunctioning of S. By running SMODELS on TEST{Sq) we discover that 
this program has no answer sets and therefore, by corollary 1, Sq is indeed a 
symptom. Diagnoses of Sq can be found by running SMODELS on 'Dq{Sq) and 
extracting the necessary information from the computed answer sets. It is easy 
to check that, as expected, there are three candidate diagnoses: 



which corresponds to our intuition. Theorem 1 guarantees correctness of this 
computation. 

The basic diagnostic module T>q can be modified in many different ways. For in- 
stance, a simple modification, T>i(S) which eliminates some candidate diagnoses 
containing actions unrelated to the corresponding symptom can be constructed 
as follows: Let 



Vo{S) = TEST{S) U DMo 



(4) 



L»i = {{o{brk,0)},{b}) 

£>2 = ({o(sr5,0)},{r}) 

£>3 = {{o{brk, 0),o{srg, 0)}, {b, r}) 



DMi = DMq U REL 



where 



/ 



1. rel{A, L) ■>— dJaw{D), 

head{D, L), 
action{D, A), 
X-act{A). 

2. rel{A,L) <— sJaw{D), 



REL 



head{D, L), 
prec{D, P), 
rel{A, P), 
X-act{A). 



3. rel{A) ^ obs{L,T), 

T > n, 
rel{A, L). 

4. <n, 
o{A,T), 

not hpd{A, T), 
not rel{A). 



4. 
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and let 

X>i(5) = TEST{S) U DMi 

It is easy to see that this modification is safe, i.e. T>i will not miss any useful 
predictions about the malfunctioning components d 

Example 3. Let us expand the system S from Example 1 by a new component, c, 
unrelated to the circuit, and an exogenous action a which damages this compo- 
nent. It is easy to see that diagnosis from Example 1 will still be a symptom 
of malfunctioning of a new system, Sa, and that the basic diagnostic module 
applied to Sa will return diagnoses D\ — from Example 2 together with new 
diagnoses containing a and ab{c), e.g. 

Da = {{o{brks, 0), o(a, 0)}, {b, c}) 

Diagnostic module T>i will ignore actions unrelated to S and return only D 1 — D 3 . 

It may be worth noticing that the distinction between hpd and o allows actions 
unrelated to observations at n to actually happen at moment n — 1. Constraint 
(4) of REL only prohibits generating such actions in our search for diagnosis. 
Even more unrelated actions can be eliminated from the search space of our 
diagnostic modules by considering relevance relation rel depending on time. 
The diagnostic module T>i can also be further modified by limiting its search to 
recent occurrences of exogenous actions. This can be done by 

X>2(5) = TEST{S) U DM 2 

where DM 2 is obtained by replacing an atom 0 < T < n in the bodies of rules 
of DMq hy n — m < T < n. The constant m determines the time interval in the 
past that an agent is willing to consider in it’s search for possible explanations. 
To simplify our discussion in the rest of the paper we assume that m = 1. Finally, 
the rule 

<— k{o{A, n — 1)}. 

added to DM 2 will eliminate all diagnoses containing more than k actions. Of 
course the resulting module D3 as well as T >2 can miss some diagnoses and 
deepening of the search and/or increase of k may be necessary if no diagnosis 
of a symptom is found. There are many other interesting ways of constructing 
efficient diagnostics modules. We are especially intrigued by the possibilities of 
using new features of answer sets solvers such as weight rules of SMODELS and 
soft constraints of DLV [19] to specify a preference relation on diagnosis. This 
however is a subject of further investigation. Suppose now the diagnostician has 
a candidate diagnosis D of a symptom S. Is it indeed a diagnosis? 

^ In the full paper we will make this and other similar statements mathematically 
precise. 
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4 Finding a Diagnosis 

To answer this question the agent should be able to test components of ^{D). 
Assuming that no exogenous actions occur during testing a diagnosis can be 
found by the following simple algorithm, Find^Diag{S): 

function FindJDiag{S) 

repeat 

{E,A) := Candidate-Diag(S); 
diag := true; Aq := A; 
while Aq yf 0 and diag do 
select c G Aq; Aq := Aq \ {c}; 
if faulty(c) then 

On ■= On U obs{ab{c), n); 
else 

On ■= On U obs{-^ab{c) ,n); 
diag := false; 

end 

end {while} 
until diag or A = %; 

return (A, A). 

The algorithm uses functions Candidate-Diag{S) which returns a candidate 
diagnosis {E, A) of S and faulty(c) which checks if a component c of S' is faulty. 
Notice that A = % indicates that no diagnosis is found - the diagnostician failed. 
To illustrate the algorithm, consider 

Example 4- Consider the system S from Example 1 and a history Fq in which b is 
not protected, all components of S are ok, both switches are open, and the agent 
closes Si at time 0. At time 1, he observes that the bulb b is not lit, considers S = 
{Fq,Oi) where Oi = {o6s(^on(6), 1)} and calls function Need-Diag{S) which 
searches for an answer set of TEST{S). There are no such sets, the diagnostician 
realizes he has a symptom to diagnose and calls function FindJDiag{S). Let us 
assume that the first call to Candidate_Diag returns 

PDi = ({o(srg,0)},|r,6}) 

Suppose that the agent selects component r from A and determines that it is not 
faulty. Observation obs{^ab{r), 1) will be added to Oi, diag will be set to false 
and the program will call Candidate -Diag again with the updated symptom S 
as a parameter. Candidate-Diag will return another possible diagnosis 

PD 2 = {{o{brk,0)},{b}) 

The agent will test bulb b, find it to be faulty, add observation obs{ab{b), 1) to Oi 
and return PZ? 2 - 

Now let us consider a different scenario: 
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Example 5. Let Iq and observation 0\ be as in Example 4 and suppose that 
the program’s first call to Candidate_Diag returns PDi, b is found to be faulty, 
obs{ab{b), 1) is added to Oi, and FindJDiag returns PDi. The agent proceeds 
to have b repaired but, to his disappointment, discovers that b is still not on! 
Intuitively this means that PDi is a wrong diagnosis - there must have been a 
power surge at 0. 

The example shows that, in order to find a eorrect explanation of a symptom, it 
is essential for an agent to repair damaged components and observe the behavior 
of the system after repair. For simplicity we assume that, similar to testing, 
repair occurs in well controlled environment, i.e. no exogenous actions happen 
during the repair process. To formally model this process we introduce a special 
action, repair(c) for every component c of S. The effect of this action will be 
defined by the causal law: 



causes{repair{c) ,^ab{c) , []) 

The diagnostic process will be now modeled by the following algorithm: (Here 
S = {Pn-i,0)) and {obs{fi,k)} is a collection of observations the diagnostician 
makes to test his repair at moment k.) 

procedure Diagnose{S; 
k := n; 

while Need-Diag{S) do 

{E,A) = Find-Diag{S); 

\i A = % then 

no diagnosis 
else 

Repair (A); 

O := OU {hpd{repair{c) , k) : c G A}; 
k ■.= k + 1; 

O :=OU {obs{fi,k)}- 

end 

end 

Example 6. To illustrate the above algorithm let us go back to the agent from 
Example 5 who just discovered diagnosis D\ . He will repair the bulb and check 
if the bulb is lit. It is not, and therefore a new observation is recorded as follows: 

Oi := Oi U {hpd{repair{b) , 1), obs{^on{b),2)} 

N eed^DiagiS) will detect a continued need for diagnosis, Find-Diag{S) will 
return D^, which, after new repair and testing will hopefully prove to be the 
right diagnosis. 

The diagnosis produced by the above algorithm can be viewed as a reason- 
able interpretation of discrepancies between the agent’s predictions and actual 
observations. To complete our analysis of step 1 of the agent’s acting and reason- 
ing loop we need to explain how this interpretation can be incorporated in the 
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agent’s history. If the diagnosis discovered is unique then the answer is obvious 
- O is simply added to r^-i. If however faults of the system components can be 
caused by different sets of exogenous actions the situation becomes more subtle. 
Complete investigation of the issues involved is the subject of further research. 

5 Related Work 

There is a numerous number of papers on diagnosis many of which substantially 
influenced the authors views on the subject. The roots of our approach go back 
to [15] where diagnosis for a static environment were formally defined in logical 
terms. Recent expansions of this work [17,12,3] which take into account the 
dynamics of system’s behavior served as the starting point of the work presented 
in this paper. We 

1. substantially simplified the basic definitions of [3]; 

2. presented reasonable efficient and provenly correct algorithms for computing 
‘dynamic’ diagnosis; 

3. showed how to combine diagnostics with planning and other activities of a 
reasoning agent. 

The simplification of basic definitions from [3] is achieved by a careful choice of the 
‘history description’ language - AC, seems to be more suitable for our purposes 
that £ used in [3]. The reasoning algorithms are based on recent discoveries of 
close relationship between A-Prolog and reasoning about effects of actions [11] 
and the ideas from answer set programming [10,13,9]. This approach of course 
would be impossible without existence of efficient answer set reasoning systems. 
Finally, the integration of a diagnostic and other activities is based on the agent 
architecture from [1]. 

6 Conclusion 

The paper describes an ongoing work on the development of a diagnostic problem 
solving agent in A-Prolog. In particular we are looking for for good modeling 
techniques with clear and provenly correct algorithms. The following can be of 
interest to people who share these interests: 

• definitions of a symptom, candidate diagnosis, and diagnosis which we believe 
to be substantially simpler than other similar approaches; 

• a new algorithm for computing candidate diagnoses. (The algorithm is based 
on answer set programming and views the search for candidate diagnoses as 
‘planning in the past’); 

• a simple account of diagnostics, testing and repair based on the use of answer 
set solvers. 

In the full paper we plan to give mathematical analysis of correctness of the 
corresponding algorithms and test them on medium size examples. 
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Abstract. In this paper we present a declarative approach to adding 
domain-dependent control knowledge for Answer Set Planning (ASP). 
Our approach allows different types of domain-dependent control knowl- 
edge such as hierarchical, temporal, or procedural knowledge to be rep- 
resented and exploited in parallel, thus combining the ideas of control 
knowledge in HTN-planning, GOLOG-programming, and planning with 
temporal knowledge into ASP. To do so, we view domain-dependent con- 
trol knowledge as sets of independent constraints. An advantage of this 
approach is that domain-dependent control knowledge can be modularly 
formalized and added to the planning problem as desired. We define a set 
of constructs for constraint representation and provide a set of domain- 
independent logic programming rules for checking constraint satisfaction. 



1 Introduction 

Planning is hard. The complexity of classical planning is known to be PSPACE- 
complete for finite domains and undecidable in the general case [8,12]. By fixing 
the length of plans, the planning problem reduces to NP-complete or worse. 
Planning systems such as FF [16], HSP [6], Graphplan [-5], and Blackbox [18] 
have greatly improved the performance of their systems on benchmark planning 
problems by exploiting domain-independent search heuristics, clever encodings 
of knowledge, and efficient data structures [30]. Nevertheless, despite impres- 
sive improvements in performance, there is a growing belief that planners that 
exploit domain- dependent control knowledge may provide the key to future per- 
formance gains [30]. This conjecture is supported by the impressive performance 
of planners such as TLPlan [1], TALplan [11] and SHOP [26], all of which exploit 
domain-dependent control knowledge. 
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A central issue in incorporating domain-dependent control knowledge into 
a planner is to identify the classes of knowledge to incorporate and to devise 
a means of representing and reasoning with this knowledge. In the past, plan- 
ners such as TLPlan and TALplan have exploited domain-dependent tempo- 
ral knowledge] SHOP and various hierarchical task network (HTN) planners 
have exploited domain-dependent hierarchical and partial-order knowledge] and 
satisfiability-based planners such as Blackbox have experimented with a variety 
of domain-dependent control knowledge encoded as propositional formulae. In 
this paper, we propose to exploit temporal knowledge and hierarchical knowl- 
edge as well as, what we refer to as, procedural knowledge within the paradigm 
of answer set planning. We show how these classes of domain-dependent control 
knowledge can be represented using a normal logic program and how they can 
be exploited by a basic answer set planner. We demonstrate the improvement in 
the efficiency of our answer set planner. 

The set of programming language constructs provided by the logic program- 
ming language GOLOG (e.g., sequence (;), if-then-else, while, etc.) [ 20 ] provides 
an example of the class of procedural knowledge we incorporate into our plan- 
ner. For example, a procedural constraint written as oi; «2; (a3|a4|a5); /? tells 
the planner that it should make a plan where ai is the first action, 02 is the 
second action and then it should choose one of 03, 04 or 05 such that after their 
execution / will be true. This type of domain-dependent control knowledge is 
different from temporal knowledge where plans are restricted to action sequences 
that agree with a given set of temporal formulas. Procedural knowledge is also 
different from hierarchical and partial-order constraints where tasks are divided 
into smaller tasks, with some partial ordering and other constraints between 
them. These three classes of domain-dependent control knowledge differ in their 
structure and while there may be transformations available between one form 
and another, it is often natural for a user to express knowledge in a particular 
form. 

To exploit the above classes of domain-dependent planning constraints we 
use the declarative problem-solving paradigm exemplified by satisfiability-based 
planners. We refer to such an approach to planning as model-based planning, to 
indicate that plans are models of the logical theory describing the planning prob- 
lem. One advantage of this approach is that planner development is divided into 
two parts: development of model generators for logical languages, and planner 
encoding as a logical theory. This enables those developing logical encodings of 
model-based planning problems to exploit the diversity of domain-independent 
model generators being developed for different tasks. 

In this paper, we use an answer set programming appraoch to model-based 
planning. We use logic programming as the logical language to encode our model- 
based planning problem. From a knowledge representation perspective, there are 
many advantages to a logic programming encoding, as compared to a simple 
propositional logic encoding. These include: parsimonious encoding of solutions 
to the frame problem in the presence of qualification and ramification constraints; 
the presence of the non-classical ’ operator that not only helps in encoding 
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causality but also can be exploited when searching for models; and many fun- 
damental theoretical results [3] that help construct proofs of the correctness of 
encodings. In contrast, few of the encodings of satisfiability-based planners have 
proofs of correctness, while most logic programming encodings are accompanied 
by a proof of correctness. From the perspective of computation, planners based 
on propositional encodings still fare better. There are currently more implemen- 
tations of propositional solvers than of logic programming answer set generators, 
and the best propositional solvers tend to be faster than the best answer set gen- 
erators. 

The rest of this paper is organized as follows: we will review the basics of 
action language and answer set planning in the next section. We then introduce 
different constructs for domain-dependent control knowledge representation. For 
each construct, we provide a set of logic programming rules as its implementation 
(Subsections 3. 1-3. 3). We use Smodels, an implemented system for computing 
stable models of logic programs [27], in our experiments. As such, the rules 
developed in this paper are written in Smodels syntax and can be used as input 
to Smodels program^. In Subsection 3.4, we describe some experimental results 
and conclude in Section 4. 

2 Preliminaries 

2.1 Action Theories 

We use the high-level action description language B of [15] to represent action 
theories. In such a language, an action theory consists of two finite, disjoint sets 
of names called actions and fluents. Actions transition the system from one state 
to another. Fluents are propositions whose truth value can change as the result 
of actions. Unless otherwise stated, a is used to denote an action. / and p are 
used to denote fluents. The action theory also comprises a set of propositions of 
the following form: 

caused({pi,...,pn},/) (1) 

causes(a, /, {pi,...,p„}) (2) 

executable(a, {pi, . . . ,p„}) (3) 

initially(/) (4) 

where / and pfs are fluent literals (a fluent literal is either a fluent g or its 
negation ~^g, written as neg{g)) and a is an action. (1) represents a static causal 
law, i.e., a ramification constraint. It conveys that whenever the fluent liter- 
als pi, . . . ,pn hold, so does /. (2), referred to as a dynamic causal law, repre- 
sents the (conditional) effect of a. Intuitively, a proposition of the form (2) states 
that / is guaranteed to be true after the execution of a in any state of the world 
where pi, . . . ,p„ are true. (3) captures an executability condition of a. It says 
that a is executable in a state in which pi, . . . ,p„ hold. Finally, propositions of 

^ Although we use Smodels, we believe that the code presented here could easily be 
used with DLV [9], following simple modifications to reflect differences in syntax. 
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the form (4) are used to describe the initial state. (4) states that / holds in the 
initial state. 

An action theory is a pair (Z9, F) where D consists of propositions of the 
form (l)-(3) and F consists of propositions of the form (4). For the purpose 
of this paper, it suffices to note that the semantics of such an action theory is 
given by a transition graph, represented by a relation t, whose nodes are the 
alternative (complete) states of the action theory and whose links (labeled with 
actions) represent the transition between its states (see details in [15]). That is, 
if (s, a, s') S t, then there exists a link with label a from state s to state s'. 

A trajectory of the system is denoted by a sequence sooisi • ■ ■ OnSn where Si’s 
are states and a^’s are actions and (si, a^+i, s^+i) S t for i € {0,...,n — 
1}. sooisi . . . anSn Is a trajectory of a fluent formula A if A holds in s„. 

In this paper, we will assume that F is complete, i.e., for every fluent /, either 
initially(/) or initially(neg(/)) belongs to F. We will also assume that {D, F) 
is consistent in the sense that there exists a non-empty relation t representing 
the transition graph of (D,F). 

2.2 Answer Set Planning 

A planning problem is specified by a triple {D, F, A) where {D, F) is an ac- 
tion theory and Z\ is a fluent formula (or goal), representing the goal state. A 
sequence of actions oi, . . . ,am is a possible plan for A if there exists a trajec- 
tory soflisi . . . OmSm such that So and Sm satisfy F and A, respectively^. 

Given a planning problem {D,F,A), answer set planning solves it by trans- 
lating it into a logic program F[(D, F, A) (or 77, for short) consisting of domain- 
dependent rules that describe D, F, and A respectively, and domain-independent 
rules that generate action occurrences and represent the transitions between 
states. 

• Goal representation. To encode A, we define formulas and provide a set 
of rules for formula evaluation. We consider formulas that are bounded classical 
formulas with each bound variable associated with a sort. They are formally 
defined as follows. 

— A literal is a formula. 

— The negation of a formula is a formula. 

— A finite conjunction of formulas is a formula. 

— A finite disjunction of formulas is a formula. 

— If X \, . . . , Xn are variables that can have values from the sorts si, . . . , s„, 
and fi{Xi, . . . , Xn) is a formula then VAi, . . . , A„./i(Ai, . . . , A„) is a for- 
mula. 

^ Note that the notion of plan employed here is weaker than the conventional one 
where the goal must be achieved on every possible trajectory. This is because an 
action theory with causal laws can be non-deterministic. Note however, that if D 
is deterministic, i.e., for every pair (s,a) there exists at most one state s' such that 
(s, a, s') G t, then every possible plan for A is also a plan for A. 



230 



Tran Cao Son et al. 



— If Xi , . . . , Xn are variables that can have values from the sorts si, . . . , s„, 
and fi{Xi, . . . ,X„) is a formula then 3Xi, . . . ,Xn.fi{Xi, . . . ,Xn) is a for- 
mula. 

A sort called formula is introduced and each non-atomic formula is associated 
with a unique name and defined by (possibly) a set of rules. For example, the con- 
junction f AgAh is represented by the set of atoms {conj{f),in{f, f'),in{g, /'), 
in{h, /')} where /' is the name assigned to fAgAh; VAi, . . . ,Xn-fi{Xi, . . . ,X„) 
can be represented by the rule 

formula{forall{f, fi{Xi,..., X„))) ^ m(Ai, si), . . . , in^X^, s„) 

where / is the name assigned to the formula. In keeping with previous notation, 
negation is denoted by the function symbol neg. For example, if / is the name of 
a formula then neg{f) is a formula denoting its negation. Rules to check when a 
formula holds or does not hold can be written in a straightforward manner and 
are omitted here to save space. (Details can be downloaded from the Web^.) 

• Action theory representation. Since each set of literals {pi,...,p„} in 
(l)-(3) can be represented by a conjunction of literals, D can be encoded as a 
set of facts of 77 as follows. First, we assign to each set of fluent literals that 
occurs in a proposition of 7? a distinguished name. The constant nil denotes the 
set {}. A set of literals {pi , . . . ,p„} will be replaced by the set of atoms Y = 
{conj{s),in{pi, s),. . . ,in{pn, s)} where s is the name assigned to {pi,...,p„}. 
With this representation, propositions in D can be easily translated into a set 
of facts of 77. For example, a proposition causes{a, /, {pi, . . . ,Pn}) with n> 0 is 
encoded as a set of atoms consisting of causes{a, /, s) and the set Y (s is the 
name assigned to {pi, . . . ,Pn})- 

• Domain independent rules. The domain independent rules of 77 are adapted 
mainly from [14,10,21,22]. The main predicates in these rules are: 

~ holds{L,T): L holds at time T, 

~ possible{A,T): action A is executable at time T, 

~ occ(A, T): action A occurs at time T, and 
~ hf{ip,T): formula ip holds at time T. 

The main rules are given next. In these rules, T is a variable of the sort time, L, G 
are variables denoting /Iwenf literals (written as F or neg(F) for some fluent F), S 
is a variable set of the sort conj (conjunction), and A, B are variables of the sort 
action. 



holds{L,T+l) <— occ{A,T),causes{A, L, S),hf{S,T). 



(5) 

(6) 

(7) 

(8) 
(9) 

(10) 

( 11 ) 



holds{L,T) <— caused{S, L), hf{S,T). 
holds{L,T+l) <— contrary{L,G),holds{L,T),not holds{G,T+l). 
possible{A,T) <— executable(A,S),hf(S,T). 



holds{L,0) <— liter al[L), initially [L). 
nocc{A, T) <— A A B, occ{B, T),T <length. 
occ{A,T) <— T < length, possible{A,T), not nocc{A,T). 



® http://www.cs.nmsu.edu/~tson/asp_planner 
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Here, (5) encodes the effects of actions, (6) encodes the effects of static causal 
laws, and (7) is the inertial rule. (8) defines a predicate that determines when an 
action can occur and (9) encodes the initial situation. (lO)-(ll) generate action 
occurrences, one at a time. We omit most of the auxiliary rules such as rules 
for defining contradictory literals etc. The source code and examples can be 
retrieved from our Web site. 

Let IIn{D, r, A) (or 7T„ when it is clear from the context what D, F, and A 
are) be the logic program consisting of 

— the set of domain-independent rules in which the domain of T is {0, . . . , n}, 
~ the set of atoms encoding D and T, and 

~ the rule <— not hf{A,n) that encodes the requirement that A holds at n. 

The following result (adapted from [22]) shows the equivalence between trajec- 
tories of A and stable models of 7T„. Let /S' be a stable model of II n, define 
s(0 = {/ I holds{f, i) € S} and A[f, j] = o^, . . . , aj where i or j are integers, / 
is a fluent, at’s are actions, and for every t, i < t < j, occ{at,t) € S. 

Theorem 1. For a planning problem (D,F,A), 

— if soGg . . . Gn-iSn is a trajectory of A, then there exists a stable model S of 

Iln such that = [oq, . . . , a„_i] and Si = s(i) fori e {0,...,n}, 

and 

— if S is a stable model of Fin with H[0, n— 1] = [oq, . . . , a„_i] then s(0)ao . . . 
a„_is(n) is a trajectory of A. 

3 Control Knowledge as Constraints 

In this section, we add domain-dependent control knowledge to ASP by viewing 
it as constraints on the stable models of the program II. For each type of control 
knowledge"', we introduce new constructs for its encoding and present a set of 
rules that check when a constraint is satisfied. 

3.1 Temporal Knowledge 

In [I], temporal knowledge is used to prune the search space. Temporal con- 
straints are specified using a linear temporal logic with a precisely defined se- 
mantics. It is easy to add them to (or remove them from) a planning problem 
since their representation is separate from the action and goal representation. 
Planners exploiting temporal knowledge to control search have proven to be 
highly efficient and to scale up well [2]. In this paper, we represent temporal 
knowledge using temporal formulas. In our notation, a temporal formula is ei- 
ther 



4 



We henceforth abbreviate domain-dependent control knowledge as control knowl- 
edge. 
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— a formula (as defined in previous section), or 

— a formula of the form until {(f), ip), always{p), eventually (p), or next{p) 

where p and p are temporal formulas. 

For example, in a logistics domain, let P and L denote a package and its 
location, respectively. The following formula: 

always{{goal{P, L) A at{P, L)) next{^holding{P))) (12) 

can be used to express that if the goal is to have a package at a particular 
location and if the package is indeed at that location then it’s always the case 
that the agent will not be holding the package in the next state. This has the 
effect of preventing the agent from picking up the package once it’s at its goal 
location. 

Like non-atomic formulas, temporal formulas can be encoded in ASP using 
constants, atoms, and rules. For example, the formula until{f,next{g)) is repre- 
sented by the set of atoms {tf(ni,next{g)),tf{n 2 ,until{f,ni))} where t/ stands 
for “temporal formula” and ni and ri 2 are the new constants assigned to next(g) 
and until{f ,neg{g)), respectively. The semantics of these temporal operators is 
the standard one. 

To complete the encoding of temporal constraints, we provide the rules for 
temporal formula evaluation. The key rules, which define the satisfiability of a 
temporal formula N at time T {htf{N, T)) and between T and T' {hd{N, T, T')), 



are given below. 

htf{N, T) ^ formula{N), hf{N, T) (13) 

hf{N,T) ^ tf{N,NP,htf{N^,T) (14) 

htf{N,T) ^tf{N,until{Ni,N 2 )),hd{Ni,T,T'),htf{N 2 ,T'). (15) 
htf{N,T) tf{N,always{Ni)),hd{Ni,T,length+l). (16) 

htf{N,T) <— tf{N,eventually{Ni)),htf{Ni,T'),T < T' . (17) 

htf{N,T) ^tf{N,next{Ni)),htf{Ni,T+l). (18) 

notJid{N, T, T') ^ not htf{N, T''),T<T" <T' . (19) 

hd{N, T, T') ^ htf{N, T),not notJid{N, T, T') (20) 



Having defined temporal constraints and specified when they are satisfied, 
adding temporal knowledge to a planning problem in ASP is easy. We must: (i) 
encode the knowledge as a temporal formula, say p-, (ii) add the rules (13)-(20) to 
7T; and (hi) add the constraint <— not htf{p, 0) to U. Step (iii) eliminates models 
of n in which p does not hold. For example, if II is the program for planning 
in the logistics domain, adding the constraint (12) to II will eliminate all mod- 
els whose corresponding trajectory admits an action occurrence that causes the 
holding(P) to be true after P is delivered at its destination. As a concrete exam- 
ple, given the goal formula at{p, I 2 ), there exists no model of II that corresponds 
to the sequence of actions pickjup{p,li),move{li,l 2 ), drop{p,l 2 ),pickjup{p,l 2 ). 
(We appeal to the users for the intuitive meaning of the effects of actions, the 
initial setting, and the goal of the problem.) 
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3.2 Procedural Knowledge 

Procedural knowledge can be thought of as an (under-specified) sketch of the 
plans to be generated. This type of control knowledge has been used in GOLOG, 
an Algol-like logic programming language for agent programming, control and 
execution, based on a situation calculus theory of actions [ 20 ]. GOLOG has 
been primarily used as a programming language for high-level agent control in 
dynamical environments (see e.g. [ 7 ]). More recently, Golog has been used for 
general planning [ 13 ]. In the planning context, a GOLOG program specifies an 
arbitrarily incomplete plan that includes non-deterministic choice points that 
are filled in by the planner (the deductive machinery of a GOLOG-interpreter). 
For example, a simple GOLOG program ai;u2; (a3|a4|a5); f? represents plans 
which have Oi followed by 02, followed by one of 03, 04, or 05 such that / is true 
upon termination of the plan. The interpreter, when asked for a solution to this 
program, needs only to decide which one of 03, 04, or 05 it should choose. To 
encode procedural knowledge, we introduce a set of Algol-like constructs such as 
sequence, loop, conditional, and nondeterministic choice of arguments/ actions. 
These constructs are used to encode partial procedural control knowledge in the 
form of programs which are defined inductively as follows. For an action theory 
{D,r) we define a program syntactically as follows. 

~ an action a is a program, 

— a formula ^ is a program®, 

— if Pi’s are programs then pi; . . . is a program, 

— if Pi’s are programs then pi| . . . |p„ is a program, 

~ if Pi and p2 are programs and ^ is a formula then “if (f> then pi else P2” 
is a program, 

— if p is a program and is a formula then “while (j) do p” is a program, 
and 

— if X is a variable of sort s, p(A) is a program, and f{X) is a formula, then 
pick{X, f{X),p{X)) is a program. 

As is common practice with Smodels, we will assign to each program a name 
(with the exception of actions and formulas), provide rules for the construc- 
tion of programs, and use prefix notation. A sequence a = pi;...;p„ will be 
represented by the atoms proc{p), head{p,ni), tail{p,n2) and the set of atoms 
representing p2; . . . ;pn, where p, ni, and ri2 are the names assigned to a, pi (if 
it is not a primitive action or a formula), and p2; . . . ;p™, respectively. 

The operational semantics of programs specifies when a trajectory soOoSi . . . 
Qn-iSn, denoted by a, is a trace of a program p and is defined as follows. 

— for p = a and a is an action, n = 1 and Oq = a, 

— for p = (j), n = 0 and 4> holds in sq, 

— for p = pi;p2, there exists an i such that s^aQ ... Si is a trace of pi and 
SiQi ... s„ is a trace of p2, 

® This is analogous to the GOLOG test action /? which tests the truth value of a 
fluent. 
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— for p = pi \ .. . \pn, a is a trace of pi for some i G {1, . . . , n}, 

— for p = if (j) then pi else p 2 , a is a trace of pi if (j) holds in sq or a is a 
trace of P 2 if neg{cj)) holds in sq, 

~ for p = while (j) do pi, n = 0 and neg{4>) holds in sq or </> holds in sq and 
there exists some i such that soap . . . Si is a trace of pi and SiUi ... s„ is a 
trace of p, and 

~ for p = pick(X, f{X), q{X)), then there exists a constant x of the sort of X 
such that f{x) holds in sp and a is a trace of q{x). 

The logic programming rules that realize this semantics follow. We define a 



predicate trans{p,ti,t 2 ) which holds in a stable model S iff s(fi)atj . . . s(t 2 ) is 
a trace of p®. 

trans{P,Ti,T 2 ) ^ proc{P),head{P, Pi),tail{P, P 2 ), (21) 

trans{Pi , Ti , Ta ) , trans{P 2 , Ta , T 2 ) . 

trans{A,T,T + 1) <— action{A), Annuli, occ{A,T). (22) 

trans{null,T,T) <— (23) 

trans{N,Ti,T 2 ) <— choice Action{N), (24) 

in{Pi,N),trans{Pi,Ti,T 2 ). 

trans{F,Ti,Ti) ^ formula{F),hf{F,Ti). (25) 

trans{I,Ti,T 2 ) ^ i/(7, F, Pi, P 2 ), (26) 

hf{F,Ti),trans{Pi,Ti,T 2 ). 

trans{I,Ti,T 2 ) ^ i/(7, P, Pi, P 2 ), (27) 

not hf{F,Ti),trans{P 2 ,Ti,T 2 ). 

trans{W,n,T 2 ) ^ while{W, F, P),hf{F,Ti),Ti < Ta < T 2 , (28) 

t r ans (P, Ti , Ta ) , t r ans ( IT, Ta , P2 ) • 

trans{W, T, T) ^ while{W, F, P),not hf{F, T). (29) 

trans{S,Ti,T 2 ) <— choice Ar g s {S, F, P) , (30) 

hf{F,Ti),trans{P,Ti,T 2 ). 



Finding a valid instantiation of a program P can be viewed as a planning 
problem (77, P, A) where A is the constraint ^ not trans{P, 0, n). Let TJj be the 
program obtained from 7J„ by (i) adding the rules (21)-(30), and (ii) replacing 
the goal constraint with <— not trans{P,0,n). The following theorem is similar 
to Theorem 1. 

Theorem 2. Let {D,P) be an action theory and P he a program. Then, (i) 
for every stable model S of 11^ , s(0)ap . . . a„-is(n) is a trace of P; and (ii) if 
sgao ■ ■ ■ ttn-iSn is a trace of P then there exists a stable model S of Ilf) such 
that Sj = s{j) and occ{ai,i) G S for j G {0, . . . , n} and i G {0, . . . , n — 1}. 

® Recall that we define s{i) = {holds{f,i) G S' | / is a fluent} and assume 
occ{ai,i) G S. 
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3.3 HTN Knowledge 

GOLOG programs are good for representing procedural knowledge but prove 
cumbersome for encoding partial orderings between programs and do not allow 
temporal constraints. For example, to represent that any sequence containing 
the n programs in which pi occurs before p2, is a valid plan for 

a goal A, one would need to list all the possible sequences and then use the 
non-deterministic construct^. This can be easily represented by an HTN con- 
sisting of the set {pi, . . . ,pn} and a constraint expressing that pi must occur 
before p2- HTNs also allows maintenance constraints of the form always{ 4 >) to 
be represented. However, HTNs do not have complex constructs such as proce- 
dures, conditionals, or loops. Attempts to combine hierarchical constraints and 
GOLOG-like programs (e.g., [ 4 ]) have fallen short since they do not allow com- 
plex programs to occur within these HTN programs. We will show next that, 
under the ASP framework, this restriction can be eliminated by adding the fol- 
lowing item to the definition of programs in the previous section. 

~ If pi,...,p„ are programs then a pair (S', G) is a program where S = 
{pi, . . . ,p„} and G is a set of ordering or truth constraints (defined below). 

Let S = {pi,...,pfe} be a set of programs. Assume that rii, 1 < i < k, is 
the name assigned to the program p^. An ordering constraint over S has the 
form rii -< rij where rii ^ rij and a truth constraint is of the form (rii, 0 ), ((/), n^), 
or (rii, (f>, rit) where ^ is a formula. In our encoding, we will represent a program 
(S, G) by an atom htn(j), Sn, Cn) where p, Sn, and Gn are the names assigned 
to (S,C), S, and G respectively. To complete our extension, we need to define 
when a trajectory is a trace of a program with the new construct and provide 
logic program rules for checking its satisfaction. A trajectory sqOo ■ • ■ On-iSn is a 
trace of a program {S, C) if there exists a sequence jo =0 < ji < ■ ■ ■ < jk=n and 
a permutation of ( 1 ,...,A:) such that the sequence of trajectories 

oti = sqclq . . . , OL2 — Sj^ ^ji * ■ ■ Sy'2 j * ■ ■; ^k ~ ^jk—i ^jk—i ■ ■ ■ Satisfies the 

following conditions: 

— for each I, 1 < I < k, ai is a, trace of pi, , 

~ if rit < ni G C then it < ip 

— if {(j>,ni) e G (or {npcj)) € G) then cj) holds in the state (or sy, ), and 
~ if (rit, 4 >^ni) G G then </) holds in . . . , 

We will extend the predicate trans to allow the new type of programs to be 
considered. Rules for checking the satisfaction of a program htn{N, S, C) are 
given next. 

trans{N,T-i_,T2) htn{N, S,C), ( 31 ) 

not nok{N,Ti,T2). 

l{begin{N, I,T3 ,Ti,T2) : between{T3,Ti,T2)}l <— htn{N, S,C),m{I , S), ( 32 ) 

^ For n = 3 , the three possibilities are pi;p2;ps, Pi;P3;P2, and p3;pi;p2- Using a con- 
current construct ||, these three programs can be packed into two programs pi;p2|jp3 
and pi;p 3 ;p 2 - 
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trans{N, Ti,T 2 ). 

l{end{N,I,T3,Ti,T2) : between{T3,Ti,T2)}l ^ htn{N,S,C), 



(33) 



in{I,S), 

trans{N, Ti,T 2 ). 



nok{N,T\,T 2 ) ^ htn{N, S,C), 



(34) 



in{I,S),T 3 > Ta, 
begin{N, 7, T 3 , Ti, T 2 ), 
end{N, 7, Ta, Ti, T 2 ). 



nok{N,Ti,T 2 ) ^ htn{N,S,C), 



(35) 



in{I, S),T3 < Ta, 
begin{N, 7, T 3 , Ti, T 2 ), 
end{N, I, Ta, Ti, T2), 
not trans{I ,T 3 ,Ta) ■ 



nok{N,T\,T 2 ) <— htn{N,S,C), 

not trans{N,Ti,T 2 ). 



(36) 



In the above rules, the predicates begin{N,I,T 3 ,Ti,T 2 ) and end{N,I,TA, 
Ti, T 2 ) are used to record the beginning and the end of the program 7, a member 
of N. Rules (32)-(33) make sure that each program will have start and times. 
These two rules are not logic programming rules but are unique to Smodels 
encodings. They were introduced to simplify the encoding of choice rules [28], 
and can be translated into a set of normal logic program rules. The predicate 
nok{N, Ti, T 2 ) states that the assignments for programs are not acceptable. (We 
omit the rules that check for the satisfiability of constraints in C of a program 
htn{N, S, C). They can be downloaded from our Web site.) Theorem 2 will still 
hold. 

3.4 Demonstration Experiments 

We tested our implementation with some domains from the general planning 
literature and from the AIPS planning competition [2]. We chose problems for 
which procedural control knowledge appeared to be easier to exploit than other 
types of control knowledge. Our motivation was: (i) it has already been estab- 
lished that well-chosen temporal and hierarchical constraints will improve a plan- 
ner’s efficiency; (ii) we have previously experimented with the use of temporal 
knowledge in the ASP framework [29] ; and (iii) we are not aware of any empiri- 
cal results indicating the utility of procedural knowledge in planning, especially 
in ASP. ([13] concentrates on using GOLOG to do planning in domains with 
incomplete information, not on exploiting procedural knowledge in planning.) 

We selected the elevator example from [20] (elpl-elp3) and the Miconic-10 
elevator domain (si-0,. . . ,s5-0s2), proposed by Schindler Lifts Ltd. for the AIPS 
2000 competition [2]. Note that some of the planners, that competed in AIPS 
2000, were unable to solve this problem. Due to the space limitation we cannot 
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present the action theories and the Smodels encoding of the programs here. They 
can be found at the URL mentioned previously. The time taken to compute one 
model with and without control knowledge are given in column 5 and 6 of the 
table below, respectively. 



Problem 


Plan 

Length 


# Person 


# Floors 


With Gontrol 
Knowledge 


Without Gontrol 
Knowledge 


elpl 


10 


2 


6 


0.600 


0.560 


elp2 


14 


3 


6 


1.411 


6.729 


elp3 


18 


4 


6 


3.224 


120.693 


sl-0 


4 


1 


2 


0.100 


0.020 


s2-0 


8 


2 


4 


1.802 


0.921 


s3-0 


12 


3 


6 


22.682 


34.519 


s4-0 


15 


4 


8 


164.055 


314.101 


s5-0sl 


19 


5 


4 


57.952 


> 2 hours 


s5-0s2 


19 


5 


5 


105.040 


> 2 hours 



As can be seen, the encoding with control knowledge yields substantially 
better performance in situations where the minimal plan length is great. For large 
instances (the last two rows), Smodels can find a plan using control knowledge 
in a short time and cannot find a plan in 2 hours without control knowledge. 
In some small instances (the time in column 6 is in boldface), the speed up 
cannot make up for the overhead needed in grounding the control knowledge. 
The output of Smodels for each run is given in the file result at the above URL. 
For larger instances of the elevator domain [2] (5 persons or more and 10 floors or 
more), our implementation terminated prematurely with either a stack overflow 
error or a segmentation fault error®. 



4 Discussions and Future Work 

In this paper we presented a declarative approach to adding domain-dependent 
control knowledge to ASP. Our approach enables different types of control knowl- 
edge such as hierarchical, temporal, or procedural knowledge to be represented 
and exploited in parallel; thus combining the ideas of HTN-planning, GOLOG- 
programming, and planning with temporal knowledge into ASP. For exam- 
ple, one can find a valid instantiation of a GOLOG program that satisfies 
some temporal constraints. This distinguishes our work from other related work 
[17,19,4,25] where only one or two types of constraints were considered or com- 
bined. Moreover, in a propositional environment, ASP with procedural knowl- 
edge can be viewed as an off-line interpreter for a GOLOG program. Because of 
the declarative nature of logic programming the correctness of this interpreter is 
easier to prove than an interpreter written in Prolog. We view domain-dependent 

® Experiments were run on a an HP OmniBook 6000 laptop with 130,544 Kb Ram 
and an Intel Pentium III 600 MHz processor). 
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control knowledge as independent sets of constraints. An advantage of this ap- 
proach is that domain-dependent control knowledge can be modular ly formalized 
and added to planning problems as desired. 

Our experimental result demonstrates that ASP can scale up better with 
domain-dependent control knowledge. In keeping with the experience of re- 
searchers who have incorporated control knowledge into SAT plan (e.g., [19]), 
we do not expect ASP with only one type of domain-dependent knowledge to 
do better than TLPLAN [1], as Smodels is a general purpose system. But in 
the presence of near deterministic procedural constraints, our approach may do 
better. More rigorous experimentation with a variety of domains including those 
used in the AIPS planning competition will be a significant focus of our future 
work. 
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Abstract. We investigate the relationship amongst some solutions to 
the frame problem. We encode Pednault’s syntax-based solution [20], 
Baker’s state-minimization policy [1], and Gelfond & Lifchitz’s Action 
Language A [7] in the propositional dynamic logic (PDL). The formal 
relationships among these solutions are given. The results of the paper 
show that dynamic logic, as one of the formalisms for reasoning about 
dynamic domains, can be used as a formal tool for comparing and uni- 
fying logics of action. 

Keywords: relationships between formalisms, frame problem, dynamic 
logic. 



1 Introduction 

Among the established formalisms for specifying and reasoning about actions 
are the situation calculus [19,23], STRIPS [3], the event calculus [17], action 
languages [7] and some other monotonic or nonmonotonic logics such as in [9]. 
Fundamental problems in this area, such as the frame problem, ramification 
problem, and qualification problem, have been widely investigated with varying 
degrees of success. Clearly, the time has come to analyze, compare and system- 
atize these formalisms and solutions in order to obtain a more complete and 
unified (if possible) theory of action. 

This paper focuses on solutions to the frame problem. We compare and ana- 
lyze the main solutions to the frame problem in the literature by encoding them 
in the propositional dynamic logic (PDL). The reasons for choosing PDL as 
the medium are twofold. First, the language of dynamic logic is expressive. It 
provides built-in expression of compound actions (i.e., generated from primi- 
tive actions by the program connectives ; , U, ?, *), non-deterministic effects and 
qualifications of actions. It has also been extended to represent concurrent ac- 
tions [10], non-execution of actions [8], indirect effects of actions [11,26]. Second, 
dynamic logic features a sound and complete axiomatic deductive system and a 
well-developed Kripkean semantics. Its proof and model theory have reached a 
high degree of sophistication through the development of theoretical computer 
science. Some features, such as decidability and the finite model property of 
PDL, and techniques such as bisimulation and filtration, are well understood. 



T. Eiter, W. Faber, and M. Truszczynski (Eds.): LPNMR 2001, LNAI 2173, pp. 240—253, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 



Encoding Solutions of the Frame Problem in Dynamic Logic 241 



In contrast to other formalisms, such as the situation calculus [23] and action 
languages [7] , PDL does not have a built-in solution to the frame problem {PDL- 
based solutions to the frame problem have been proposed via extensions [2,8,22]). 
In this paper, however, we show that three solutions to the frame problem (Ped- 
nault’s syntax-based approach [20], Baker’s circumscription [1] and the action 
language A [7]) can be encoded in PDL. The relationship amongst these solu- 
tions is clarified and we prove that in the case that action descriptions are in 
normal form and queries are simple, these solutions to the frame problem are 
essentially equivalent. In contrast to the work in [14], our results show that the 
equivalence of the solutions heavily depends on the syntactical restrictions of 
action descriptions and queries. 

Due to the limitation of space, we omit all the proofs of theorems. 

2 Reasoning abont Action in PDL 

In dynamic logic, a causal relation between an action a and its effect A is ex- 
pressed by a modal formula: [a\A^ read as a always causes A. For instance, 
[Shoot]^alive represents “shooting at a turkey kills the turkey”. The formula 
{a) A reads as a is executable and possibly causes A to be true, where (a) is the 
dual operator of [a]. In particular, (a)T represents a is executable, where T 
represents the logical constant true. -< a >- A denotes “(o:)T — > {a)A\ mean- 
ing “?/ a is executable, then a may cause A." A language of PDL consists of a 
set Flu of fluent symbols (propositional variables) and a set Actp of primitive 
action symbols. We will use /, /i, / 2 , etc., to denote fluents, and use a, ai, 02 , 
etc., for primitive actions. The formulas {A € Fma) and actions (a S Act) can 
be defined as usual [18]. A formula which does not include modal operators is 
referred to as a propositional formula (ip S Fmap). The semantics and deductive 
system of PDL can be found in any standard introductory text e.g [18]. 

2.1 Action Description 

PDL provides a formal language to describe behaviors and internal relations of 
a dynamic system. Those sentences which describe the generic effects of actions, 
domain constraints and causal ramifications are generally called action descrip- 
tion. In this paper, an action description of a dynamic system is any finite set 
of PDL formulas. 

Example 1 Consider the Yale Shooting Problem [12] described by the following 
action description: 



^ They are available at http://www.cse. unsw.edu. au/'ksg/Pubs/ksgworking.html. 
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The first three sentences state the effects of action Load and Shoot on fluent 
loaded and alive (effect axioms). The last three represent the executability of 
actions (qualification axioms). 

An action description S is normal if each formula in S is of the form: 

— If —>■ [a]l {deterministic action law) 

— ip a >- I {non- deterministic action law) 

— p ^ {a)T {qualification law) 

where p G Fmap , a G Actp and I is a fluent literal.^ 

It is easy to see that the action description in Example 1 is normal. 



2.2 Reasoning with Action Description 

A formula in an action description is different from an ordinary formula. The 
sentence “loaded —>■ [Shoot]^alive^^ states that whenever loaded is true, Shoot 
must cause —•alive. In the situation calculus this is written as y s{loaded{s) 
~^alive{do{Shoot, s))) instead of loaded{s) ~^alive{do{Shoot, s)) for some par- 
ticular situation s. A simple approach to the problem in dynamic logic, which 
was introduced in [26], is to treat an action description as a set of extra axioms 
of PDL. 

Definition 1 [26] Let E be an action description. A formula A is E -provable, 
written A , if it belongs to the smallest set of formulas which contains all 
theorems of PDL all elements of E, and is closed under modus ponens and 
modal generalization [18]. 

Consider the action description E in Example 1. We can prove that 
^loaded [Load; Shoot]^alive.^ 

2.3 Consistency of Action Description 

An action description E is consistent^ if 1/^ T, where T represents logical con- 
stant “false”. Let A be a normal action description. For any fluent / and any 
primitive action a, if we merge the action laws about a and /(^/) in each form 
together, there are at most five laws about a and f in E: 

po (a)T 

yi.i [a]/, Pi, 2 [a]^/ 

P 2 ,i a >- -■/, p 2,2 a>- f 

^ In [25], the normal form of action descriptions is defined in a more general version 
to express indirect effects of actions based on the extended PDL [26]. 

® By using PDL axioms and Definition 1. 

In [26] , it is called uniformly consistent, distinguishing from the consistency of normal 
set of formulas. 
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It is easy to see that if V^i.iand ip \^2 are true simultaneously, then the 
action description will contain a contradiction. Similarly for (/jq, ^pi,j and ip 2 ,j 
{j = 1 or j = 2). We call a normal action description S is safe if it is satisfies 
the following assumption: 

I — <(fio V V-k/ 51,2 and I — V (j = 1,2^ 

The following theorem shows that the safety is a sufficient condition of the 
consistency of normal action descriptions. 

Theorem 1 [25] Let S he a normal action description. If S is safe, then it is 

consistent. 

Since the action description in Example 1 is safe, it is consistent. 

We remark that the normal form is quite expressive though not every action 
description can be expressed in normal form. Any action description written 
in the form of pre-condition axioms and successor state axioms in the proposi- 
tional situation calculus language (that is, there are no sort object and function 
symbols in the language [23]) can be translated into normal form and moreover 
the resultant action descriptions are safe. Action descriptions written in A or in 
STRIPS can also be expressed in normal form. Additionally, the determinism of 
action (i.e., for any initial state there exists one and only one next state) can be 
expressed by normal form. 

3 Properties of PDL Models 

We now present some special properties of PDL models which are not included 
in the standard discourse of dynamic logic but are useful for the purpose of the 
paper. 

3.1 PDL Models 

A model for a PDL language is a structure of the form M = (W, {Ra ■ a G 
Actp}, V), with Ra a binary relation on W for each primitive action. Note that 
we only consider the accessibility relations of primitive actions. Those for com- 
pound actions can be defined by using the standard model conditions [18]. The 
satisfiability relation is defined as usual. A model M satisfying a formula A in 
world w is denoted M A. A is valid in M, denoted hy M \= A, \i M \=u, A 
for all w G W. Let D be an action description. A model M is a E-model if 
AI \= A for any A G E. Intuitively, a model is a A-model if E is true in ev- 
ery world of the model; Mod{E) denotes the set of all A-models. In [26], it is 
shown that for any action description E, A iff A is valid in all E -models. 
We now investigate models which are relevant to the models of action language 
and situation calculus. 

Definition 2 A model M = {W,TZ,V) is saturated if for each interpretation 
I of Flu, there exists w G W such that M [=m I. We use Mods{E) to denote 
the set of all saturated A-models. 
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Proposition 1 If E is normal and safe, then \~^ A iff M \= A for any M S 
Mods{E). 

We will show in section 6 that the saturation of PDL models corresponds to 
the Existence of Situation Axioms (ESA) [1]. Note that Proposition 1 depends on 
the definition of the normal form of action description. If we allow a normal action 
description to describe domain constraints or indirect effects, this proposition 
will cease to hold. 

Definition 3 A model M = (W, TZ, V) is natural if 

1. W is the set of all interpretations of Flu, 

2. for any / e Flu, w G V{f) iS f € w. 

We denote the set of all natural A-models by Modi\[{E). 

It is easy to see that any natural model is saturated. 

Proposition 2 If E be normal and safe, then ip [oi; • • • ; a„]Z iff M \= 
(fi [fli; • • • ; a„]/ for any M G ModN(E). 

A formula in the form ip [ai; • • • ; a„]/ is referred to as a simple query, 
where p is & propositional formula, I a literal. Notice that Proposition 2 is only 
true for simple queries. For instance, let E = %, Flu = {/} and Actp = {a}. Let 
A = fM < a>- ->/V ^ a > — < a>- fy ^ a > — < a > — < a >- f. Then A is valid in all 
the natural models but 1/^ A. This proposition is a key lemma of Theorem 2. 

Definition 4 A model M = {W,IZ,V) is functional if for any a G Actp, Ra is 
a function on W . We denote the set of all natural functional A-models by 
ModpipiyE). 

The syntactical condition with respect to functional models is so-called de- 
terminism, which means that each state can have and only have one next state 
after an action. 

Definition 5 Let S = {{a)f — > [a]f : a G Actp and f G Flu} U {(a)T : a G 
Actp}. An action description is deterministic^ if E \- E. 

Note that (a)/ — s- [a]f can be expressed in normal form in the following 
way: /' ^ [a]/, ^/' ^ [o]^f, where /' is a new fluent symbol (in most cases, we 
can put the descriptions of determinism and effects of actions together without 
introducing new fluent symbols). 

Proposition 3 Let E be normal and safe. If E is deterministic, then \-^ A iff 
M \= A for any M G Mod^piE). 

Note the difference between Proposition 2 and 3. We can relax the restriction 
of simple query at the price of allowing only deterministic action descriptions. 

® Here we assume that a deterministic action is always executable for simplicity. It 
can be relaxed at the price of a more complex formalization. 
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3.2 Minimizing PDL Models 

Let M = {W,TZ,V) be a PDL model. For any w G W, let ||w|| = {/ G Flu : 
M \=u, /} U {^/ : / G Flu & M |=u, /}. We denote Chg{M) = {(a, /, w) : 
3w' {wRaw' & / G (||w||\||t(;'||) U d|t(;'||\||t(;||)}. In words, (a, /, w) G Chg{M) iff 
there exists an accessible world w' to w on action a such that the truth value 
of / is different at w and w' . 

Definition 6 For any Mi, M 2 G Mod{F), Mi C M 2 iff 
1. Wi = W 2 , 

Vi(f)=V2(f), 

3. Chg{Mi) C Chg{M 2 ). 

We denote the set of iz-minimal models in Mod{S) as min(Mod(Fi)). Intu- 
itively, Ml IZ M 2 means M\ has lesser state ehange than M 2 - 

4 Pednault’s Solution to the Frame Problem 

We first encode Pednault’s syntax-based solution [20] to the frame problem in 
PDL. Before doing this, let’s recall the meaning of the frame problem. 

To formalize the effects of actions in a dynamic system, it is necessary to 
provide all the effect axioms of actions (which specify what is affected by actions). 
Often this is easy because most actions affect only a few of the relevant fluents. 
In contrast, listing all the frame axioms (which specify what is not affected by 
actions) is tedious. Moreover, they are much more numerous than effect axioms. 
For instance, in Example 1, only effect axioms were listed. There are nine frame 
axioms, such as alive [Loadjalive, loaded —> [Wait]loaded etc., that were not 
listed. Without these axioms, the action description is incomplete. We cannot 
even establish the intuitive assertion alive — *■ [Load]alive. The frame problem 

is how to invent an inference mechanism for reasoning about effect of action with 
incomplete action descriptions. 

Pednault [20] introduced an approach to the frame problem with which frame 
axioms can be automatically generated from effect axioms and qualification ax- 
ioms. Consider an normal action description E without non-deterministic action 
laws. Suppose that the positive and negative effect axioms and qualification 
axioms about action a and fluent / in an action description E are: 

(fio (a)T, (fii [a]/, (fi2 [a]^f- 

According to the Completeness Assumption [23], we have the following frame 
axioms: 

^ ^‘f' 2 ) A / ^ [a]f 
V A ^ [a]^f 

All frame axioms generated by this procedure are referred to as the frame 
axioms with respeet to E. For instance, MoadedAalive —> [Shoot]alive is a frame 
axiom about Shoot and alive with respect to the action description in Example 
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1. Suppose that A is the set of all the generated frame axioms with respect to 
17. Then we are able to prove that {^loaded} [Load] Wait] Shoot]^alive. 

In general, given a set S of effect axioms, we generate all the frame axioms 
with the above procedure. Let A be all the generated frame axioms. Then EL>A 
will be the complete action description with respect to S. Therefore, to answer 
a query A, we only have to make the inference A. 

The following theorem establishes the semantic condition for Pednault’s solu- 
tion. It also gives the relationship between syntax-based and minimization-based 
approaches. 

Observation 1 Let E he a normal aetion deseription without non- deterministic 
action laws. If E is safe, then A iff M \= A for any M Gmin(Mods{E)) 

where A is the set of frame axioms with respect to E. 

It is not hard to extend Pednault’s solution to non-deterministic case. 



5 Encoding the Action Language A. in PDL 

The action languages [6] offer a simple and elegant solution to the frame problem. 
In this section, we show that the action language A can be embedded into PDL. 
Our approach can also be extended to the action language B and C if we base 
on the extended propositional logic (EPDL) [26]. 

An action description E in the language A [6] is a set of expressions of the 
form: a causes I if ip, where a is a primitive action, I is a fluent literal, and ip 
is a conjunction of literals. The state of a dynamic domain is expressed by a set 
of axioms of the form: now 1. A query in action language A is an expression 
of the form: necessarily ip after ai, • • • , a„, where is a propositional formula 
and ai, • • • , a„ are primitive actions. 

A structure T = {W, {Ra C W x W : a € Actp}, V) is a transition system of 
an action description E if 

1. kP is the set of all interpretations of Flu, 

2. P is a function from Flu to 2^ such that / € V{w) iS f G w. 

3. {w,w') G Ra iff E{a,w) C w' C E{a,w) U w, where E{a,w) is the set of 
the head I of all expression “a causes I if p” in E such that w satisfies p. 

Let T be a set of expressions in the form: now 1. A query “necessarily p 
after ai, - ■ ■ , a„” is a consequence of T in T if, for any chain (wo,wi) G Ran 
■ ■ ■, {wn-i,Wn) G Ra„, whenever wq satisfies I for each now I G F, Wn satisfies 
p. A query “necessarily p after oi, • • • , a„ is a consequence of F with respect 
to an action description E if it is a consequence of F in any transition system 
of A. 

According to the translation between A and PDL shown in the Appendix, we 
can easily transform an action description and a state description between two 
languages. Since such a translation is one-to-one, we will only use PDL language 
to describe action descriptions, initial states and queries. They are easily recog- 
nized with context. It is easy to see that an action description in language A is 
always normal and safe. There is an important difference between the semantics 
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of action language and PDL. In A, there is no explicit expression for qualifica- 
tion of actions. An underlying assumption, called Qualification Completeness, in 
the semantics is that an action is always executable unless the action description 
implies that it is not. In PDL, there is no such assumption. Thus qualification 
of actions must be explicitly specified. 

Let 27 be a finite action description in A. Suppose the action laws about an 
action a and fluent / are tpi [a]f and ip2 This implies that a is 

not executable when (pi A ip2- Collecting all the conditions of non-executability: 
■ ■ ■, , we know that a is not executable if ■ •V(<^"A</32 )• 

By Qualification Completeness, we assume that {-^{ip\/\Lp\) f\- ■ ■ Lpf)) 

(a)T ; such a condition is an induced qualification law . Let A be the set of all 
such laws from 27. Then we have 

Observation 2 Let 27 be a finite action description and P a finite set of axioms, 
both in A. A query ‘^necessarily p after a\, - ■■ ,an” is a consequence of P with 
respect to 27 (/\ T) ^ [oi; • • • ; a„]/, where A is the frame axioms 

with respect to 27 U A. 

Clearly, the expressive power of A is quite restricted. Action descriptions can 
only be normal. And queries can only be simple in our terminology. 



6 Encoding Baker’s Solution in PDL Models 

Finally we consider Baker’s solution. First, we have to recall the basic assumption 
of the approach. 

6.1 Models of Situation Calculus 

A model of the situation calculus [1,19], (an 5'C'-model), consists of the various 
domains: the domain of situations \Ai\s, the domain of actions |TV1 |o and the 
domain of fluents |7W|/; as well as interpretations for the constants: 

1. Interpretations for the relations Holds and Ab: 

Holds^ C \M\f X |M|„ Ab^ C \M\a X \M\f x |A4|«, 

2. Interpretation for the Result function: 

Result^ e (|M|a X |M|, ^ |MU). 

The following axioms were used in Baker’s circumscriptive solution to the 
frame problem: 

1. Unique names axioms: 

— Unique Name Axioms for fluents (UNAF): for any /i, /2 G Flu, fif^fi- 
— Unique Name Axioms for Actions (UN AA): for any ai, U2 G Flu, a\ yf 02. 

2. Commonsense Law of Inertia (CLI): 

~^Ab{a, f, s) ^ {Holds{f,Result{a,s)) ^ Holds{f,s)) 

3. Domain Closure Axioms: 
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— Domain Closure Axiom for Fluents (DCAF): 

/ = /iV/=/2V---V/ = /„V--- 
— Domain Closure Axiom for Actions (DCAA): 
a = aiVa = a2V---Va = a„V--- 
4. Existence of Situation Axioms (ESA): 

3s{Holds{fi, s) A Holds{f 2 , s) A • • • A Holds{fn, s) A • • •) 

3s{Holds{fi, s) A ~^Holds{f 2 , s) A • • • A Holds{fn, s) A • • •) 

3s{^Holds{fi, s) A ~^Holds{f 2 , s) A • • • A -^Holds{fn, s) A ■ ■ ■) 

For the sake of simplicity, we omit the formal presentation of domain closure 
and existence of situation axioms and ignore language differences in the rep- 
resentation of an action description based on the translation in the Appendix. 
Therefore, S is an action description in the situation calculus if it is a translation 
from an action description in PDL. Furthermore, an SC model Ad is a S -model 
if M. satisfies all the formulas in S. 

6.2 Relations between SC Models and PDL Models 

First, we translate an SC model to a PDL model. 

Definition 7 Let A4 be an SC model. A PDL model M = {W^TZ^V) is the 
corresponding model of M if 

1. W=\M\s, 

2. (iRa iff = Result^ (a^ ,sf), 

3. e V{f) iff Holds^if^,s^). 

Lemma 1 Let M = (W, TZ, V) he a PDL model and M the corresponding model 
of M. Then 

1. M |=sX (p iff Ai \= Holds{(p, s). 

2. M is functional. 

3. If M satisfies the common sense law of inertia, then (a,f,s-^) € Chg{M) 

€ Ab^. 

4- If E is a normal action description, then M is a E -model iff M is a 
E -model. 

5. If M satisfies Existence of Situation Axioms, then M is saturated. 

Next, we consider the transformation of PDL models to SC models. 

Definition 8 Let AI = (W,IZ,V) be a functional PDL model. An SC model 
AA is the corresponding model of M if 

1. \A4\f = Flu, \M\a = Actp, |M|« = W. 

2. s' = Result-'^ {a, s) iff (s, s') G Ra. 

3. (/,s) G Holds^ iff /G P(s). 

4. (a, /, s) G Ab^ iff (a, /, s) G Chg{AI). 
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Lemma 2 Let A4 be the corresponding model of a functional PDL model M = 
{W,n,V). Then 

1. M \= Holds{ip, s) iff M |=gAi ip 

2. M satisfies the Commonsense Law of Inertia. 

3. M satisfies Domain Closure Axioms for Fluents and Actions. 

4 . M satisfies Unique Names Axioms for Fluents and Actions. 

5. If E is a normal action description, then M. is a E -model iff M is a 
E -model. 

6. If M is saturated, then A4 satisfies Existence- of- Situation Axiom. 

The following shows the relationship between SC models and PDL models. 

Lemma 3 Let M he a functional PDL model. If M is the corresponding model 
of M, then M is the corresponding model ofM. Conversely, suppose that M is 
an SC model and M the corresponding model ofM. If M. satisfies: 

1. Domain Closure Axioms for Fluents and Actions, 

2. Unique Names Axioms for Fluents and Actions, 
then M is the corresponding model of M . 

6.3 Relationship between Baker’s Circumscription Policy and 
PDL-Model-Based Minimization 

Since the Holds function cannot be nested, not every formula in PDL can be 
translated into the situation calculus language. We call an action description is 
SC-expressible if it can be translated into situation calculus language. 

Observation 3 Let E he a deterministic and SC-expressible action description. 

1. M GminfModpiE)) if and only if its corresponding SC model is a model 
of CIRCUM{E U <F- Ah] Result). 

2. M. is a model of C I RCU M {E U F; Ah; Result) if and only if its corre- 
sponding model in min(M odp{E)) . 

where F is the set of UNAF, UNAA, DCAF and DCAA. 

Note that the action description in the observation is not necessarily normal. 
However, if we impose syntactical restrictions on action description and queries, 
we can prove that all the solutions to the frame problem we considered thus far 
are equivalent. This result corresponds to the one in [14]. 

Corollary 1 Let E be a normal action description, F a finite set of literals. If 
E is deterministic and safe, then the following statements are equivalent: 

1. (AT)- [ai; • • • ; a„]/, where A is the set of frame axioms with respect 
to E. 

2. For any model M dmin(ModNp{E) ) , M \= {/\ F) ^ [ai; ■ ■ ■ ; a„]L 

3. “necessarily I after oi, • • • ,an” is a consequence of F with respect to E. 

4. CIRCUM{E U F; Ab] Result) [= Vs(Ro/ds((A T), s) — 

Holds{l, Result{a\, •••,a„,s))). 



where F is the set of UNAF, UNAA, DCAF, DCAA and ESA. 
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7 Conclusion and Discussion 

We have encoded three typical solutions to the frame problem: Pednault’s 
syntax-based solution, Baker’s circumscription and Gelfond & Lifchitz’s action 
language A, in PDL in either syntax or semantics. Three observations have 
been given which show the formal relationships among these solutions, which 
are helpful for a fuller and deeper understanding of the frame problem and the 
associated solutions. As a corollary of these observations, we know that for nor- 
mal and safe action descriptions and simple queries, all the solutions to the frame 
problem are equivalent. This corresponds to the result in [14], where Pednault’s, 
Reiter’s and Baker’s solutions to the frame problem were compared based on 
action language A. A crucial difference between Kartha’s result and ours is the 
following. Action language A is the least expressive language among the for- 
malisms of action. Under its restrictions we cannot see the difference among the 
solutions (Corollary 1). In contrast, dynamic logic is the highest with regard to a 
certain level (propositional or first-order). This makes a systematic comparison 
of formalisms on action possible. Additionally, the soundness and completeness 
of dynamic logic bridge the syntax and semantics, which makes the unification 
of different approaches possible. 

With help of the formal results in the paper, we would like to make the 
following remarks: 

Syntactical restrictions: The equivalence among the solutions to the frame 
problem relies heavily on the syntactical restrictions on action description and 
queries. For instance, if S is not normal, the validity of a formula A in all the 
natural saturated A-models does not guarantee A. Thus the link between 
the A-provability in PDL and provability from transition systems of action 
language A will not exist. Additionally, the form of queries is also crucial to 
the equivalence. Fortunately, the link between minimizing PDL models and 
minimizing SC models does not depend on the normality of action description. 

Extensibility of action formalisms: Each formalism of action has been or 
is intended to be extended to accommodate non- deterministic and indirect ef- 
fects of actions and compound and concurrent actions. Compatible extensions of 
these formalisms will approximate dynamic logic in expressiveness. For instance, 
to extend A to express general queries requires transition systems to allow “non- 
natural” models according to Proposition 2. Currently, to express programs or 
compound actions, dynamic logic might be the best formalism among the exis- 
tent ones. 

Epistemic minimization and physical minimization: We know that 
Baker’s circumscriptive policy (varying Result) corresponds exactly to the min- 
imization of PDL models. We may remember that we took a detour, varying 
Holds, before we reached the “right solution”: state-minimization [24]. Such a 
detour does not seem necessary in PDL models or transition systems. There is 
a subtle difference between circumscriptive first-order models and minimizing 
PDL models. With circumscription we minimize abnormality whereas in PDL 
we minimize change of worlds. We refer to the former kind of minimization as 
to be epistemic and the latter as to be physical. 
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Action-oriented frame problem: We have considered Pednault’s syntax- 
based solution and Baker’s model-based solution to the frame problem. How- 
ever, these solutions only work for the so-called fluent- oriented frame problem 
(see [13]). A remaining challenge is to encode the solutions to the action-oriented 
frame problem, i.e, how to make it the default in PDL that only actions men- 
tioned in the action description have effects. A typical approach to the action- 
oriented frame problem is using action variables to range over all actions which 
have effects. A compact representation of frame axioms can then be offered by 
using the Explanation Closure Assumption and quantifying over action vari- 
ables [23,13]. Such an approach cannot be encoded in PDL because there are 
no action variables (even in first-order versions). In [5], it was shown that given 
a normal and safe action description S, if where A is the set of all 

the frame axioms with respect to E, then there is a subset A' of A such that 
all the action symbols occur in A' occur in A. This means that if we postpone 
listing frame axioms till a query arises, frame axioms in which the actions are 
irrelevant to the query are not needed for answering the query. Therefore, the 
action-oriented frame problem is not a problem in this sense. 

Appendix: Translations between Languages 

We now provide an intertranslation between dynamic logic, situation calculus 
and action languages. This intertranslation is not formal. For instance, a fluent 
symbol stands for a proposition in PDL but is an individual in situation calcu- 
lus. Again, Holds{(p, Sq) make sense only in the extended predicate of Holds. 
Additionally, all these translations depend on the semantics of the associated 
action logics. 



1. Expressions for describing initial state: 
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Abstract. The language £ for reasoning about actions and change can 
be translated into an argumentation framework. In this paper, we extend 
this translation of the basic language and show how it can, together with 
methods from abduction, form the basis for a principled implementation 
of £. The extension we have considered concerns the addition of new 
type of sentences in the language as well as allowing theories where the 
narrative of events given is incomplete. 

A system, called £-RES, is developed within the argumentation frame- 
work of Logic Programming without Negation as Failure {LPwNF). This 
can support directly a variety of modes of common sense reasoning such 
as: default persistence in credulous or sceptical form, assimilation of ob- 
servations and their diagnosis possibly under incomplete information, as 
well as combinations of these. To improve the efficiency of the system 
we have considered the integration of a SAT solver within the LPwNF 
computation, to carry out the of validating the time universal constraints 
imposed by ramification statements. 



1 Introduction 

General formalisms of action and change can provide a natural framework for 
a variety of AI problems such as diagnosis, planning and cognitive robotics. 
They can offer a high level of expressivity and a basis for the development of a 
computational framework to solve these problems. In this paper we study how 
one such formalism, the Language £ [10], can be developed into a framework 
capable of supporting a variety of basic reasoning modes needed to address this 
type of AI applications. 

The computational foundation of this framework and its associated system, 
called £-RES, is a re-formulation of the Language £ in terms of argumenta- 
tion [2], within the framework of Logic Programming without Negation as Failure 
{LPwNF) [3], together with a synthesis of methods from abductive reasoning [9]. 
This allows a principled implementation of the f-RES system in a way that sep- 
arates issues of expressiveness and efficiency. It is then possible to examine how 
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we can use in a modular way ’’external” solvers, e.g. a SAT solver [6] or a no- 
tion of relevancy of part of the theory to the goal at hand, for improving the 
computational behaviour of the framework. 



2 The Language E and Its Model Semantics 

The vocabulary of the Language £ consists of a set of fluent constants, a 
set of action constants, and a partially ordered set (7T, of time-points. This 
vocabulary depends each time on the domain being modeled. A fluent literal is 
either a fluent constant F or its negation ^F. In the current implementation 
of the £-RES system the only time structure that is supported is that of the 
natural numbers, so we restrict our attention here to domains of this type. 

Domain descriptions in the Language £ are collections of the following kinds 
of statements (where A is an action constant, T is a time-point, E is a fluent 
constant, L is a fluent literal and C is a set of fluent literals): 

— t-propositions: L holds-at T 

— h-propositions: A happens-at T 

— c-propositions: A initiates F when C, or A terminates F when C 

— r-propositions: L whenever C 

— p-propositions: A needs C. 

T-propositions are used to record observations that particular fluents hold or 
do not hold at particular time-points. H-propositions are used to state that 
particular actions occur at particular time-points. C-propositions state general 
“action laws” - the intended meaning of “A initiates F when C” is “C is 
a minimally sufficient set of conditions for an occurrence of A to initiate A” . 
R-propositions serve a dual role in that they describe both static constraints 
between fluents and ways in which fluents may be indirectly affected by action 
occurrences. P-propositions state necessary conditions for an action to occur. 

The semantics of £ is based on a notion of a model of a domain D. A map, 
F[ : <P X n I— f {true, false}, is an interpretation of D. Given a time point T 
and a fluent constant F we first define the notion of an initiation-point {termi- 
nation-point resp.) for F in FI relative to D as follows. Consider first the case 
where D contains no r-propositions. Then T is an initiation-point (termina- 
tion-point resp.) for F in H relative to D iff there is an action constant A such 
that (i) D contains both an h-proposition A happens-at T and a c-proposition A 
initiates (terminates, resp.) F when C, and (ii) F[ satisfies C at T (i.e for each 
F £ C, H{F,T) = true, and for each F' such that ~^F' G C, F[{F' ,T) = false). 

When the domain D contains r-propositions this definition has to be ex- 
tended to allow for initiation or termination points that are generated recursively 
through such these r-propositions. 

Definition 1. (Initiation/termination point) Let FI he an interpretation of £, 
and D be a domain description. Let W be the set x and let the 

operator F : W W be defined as follows. For each, {Fn, 7e) G W denote 
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J-{{In,'Ie)) by {Xn', 7 e'). Then for any F G (p and T G U, (F,T) is inXn' (resp. 
in Xe! ) iff one of the following two conditions holds. 

1 . There is an A G A s.t. (i) there is both an h-proposition in D of the form 
“A happens-at T” and a c-proposition in D of the form “A initiates F 
when C” (resp. “A terminates F when C”) and (ii) FI satisfies C at T. 

2 . There is an r-proposition in D of the form ‘F whenever C” (resp. ‘FF 
whenever C”) and a partition {Ci,C2} of C such that (i) C\ is non-empty, 
for each fluent constant F' G Ci, (F',T) G Xn, and for each fluent literal 
~^F' G Cl, (F',T) G Te, and (ii) there is some T2 G II, T ^ T2, such that 
for all T\, T F Ti A T2, H satisfies C2 at T\. 

Let {Xn^ ,Te^) be the least fixed point of the (monotonic) operator F starting from 
the empty tuple ( 0 , 0 ). T is an initiation-point (resp. termination-point) for F 
in H relative to D iff {F,T) G Xnl (resp. (F,T) G Te^ ). 

It is useful to note that any initiation or termination point at some time T 
relative to D, defined in this way, must refer to at least one known h-proposition 
at T in the domain D. 

Given this notion of an initiation and termination point then an interpreta- 
tion H is a, model of D iff, for every fluent constant F and time-points Ti ^ T3: 

1 . If there is no initiation- or termination-point T2 for F in H such that Ti A 
T2 A n, then H{F,Ti) = H{F,Tf). 

2 . If Ti is an initiation-point for F in H , and there is no termination-point T2 
for F in H such that Ti A T2 A T3, then H{F, T3) = true. 

3 . If Ti is a termination-point for F in H, and there is no initiation-point T2 
for F in H such that Ti A T2 A T3, then H{F,Tf) = false. 

4 . H satisfies the following constraints: 

— For all F holds-at T in D, H{F,T) = true, and for all ^FF holds-at 
T'” in D, H{F,T') = false. 

— For all A needs C in D and A happens-at T in D, H satisfies C at T. 

— For all L whenever C in D, and time-points T, if H satisfies C at T 
then H satisfies {L} at T. 

A domain D entails (written |=) the t-proposition F holds-at T 
(^G holds-at T, resp.), iff for every model H of D, H{F, T) = true {H{G, T) = 
false, resp.). 

The first three conditions for a model encapsulate a notion of default persis- 
tence for fluents whereas the fourth condition imposes other constraints on the 
model from explicit information about the fluents given in D. This separation 
allows a modular extension of the language and, as we will see, facilitates the 
development of a proof theory for it. 
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Example 1. (Bulb Domain: Dh) 

SwitchOn initiates Light when {Normal} (Db^) 
SwitchOjf terminates Light (Db2) 

Break terminates Normal (-D{,3) 

-^Light whenever {^Normal} (^fc4) 

SwitchOn needs {^Light} (Di,5) 

SwitchOn happens-at 2 {DbQ) 

Normal holds-at 0 {Db7) 



In this example, Db entails Light holds-at 4 but not when Db is extended with 
Break happens-at 3. 

The above semantics assumes that no events occur other than those explicitly 
given in the domain description D. This is not always a valid assumption and it is 
possible to have domains where some action types are open, e.g. in the example 
above Break could be considered as open. Following work on abduction [9], we 
define a notion of generalized model of as any model of DU Ab, where Ab 
is any set of h-propositions over the open action types in H. A corresponding 
entailment is then defined in terms of these generalized models. 



3 An Argumentation Proof Theory for £ 



The basic subset of the language £, comprising only of h- and c-propositions, 
has been re-formulated into the argumentation framework of LPwNF in [1 1] . In 
this section, we give a brief review of the argumentation formulation of £ and 
show how it can be extended when we extend the syntax of the basic language or 
when we allow open action types in a domain description. This results in a proof 
theory for £ which in turn will form the basis of the f-RES system implementing 
the language. 

The argumentation re-formulation of £ translates a domain D, over the 
basic subset of the language, into an argumentation program Pe{D) = 
{B{D),A£,<s) in LPwNF. The background monotonic logic (£, b) of the LP- 
wNF framework is: 



C consists of all sentences Ao^Ai,...,A„ (n > 0), with Aj, 0 < i < n, 
positive or negative (via a negation or complement operator, ->,) literals, 
and all variables implicitly universally quantified from the outside, and 
h is obtained by repeatedly applying the classical modus ponens inference 



rule 



A ^ r, Y 



X 



with A <— E any ground instance of a sentence in C. 



Given D, B{D), called the background theory for D, is given by: 



— If A happens-at T G D, then Happens{A,T) € B{D). 

— If A initiates F when {Ti, . . . , L„} e D, then B{D) contains a rule for 
Initiation : 
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Initiation{F,t) Happens{A,t),HoldsAt{Li,t)^. . . , Holds At{Ln, t)- 
Similarly, for ’’terminates” c-propositions a rule for Termination is given in 
B{D). (Here and below HoldsAt(^FiA) stands for -^HoldsAt{Fi,t))- 

The rest oi Ps{D) is independent of any given domain D. As, called the argu- 
mentation theory consists of: 

Generation rules: 

Holds At{f, t2) Initiation{f Al) ,t\ <t2 
~^HoldsAt{f, t2)-^Termination{f, ti), 

Persistence rules: 

Holds At{f, t2) <— HoldsAt{f, 

~^HoldsAt{f, t2)^^HoldsAt{f,ti),ti-<t2 

Assumption rules: 

Holds At{f, t) 

~^HoldsAt{f, t) 

Also, <s is a priority relation defined over As by: 

NP[f, t] ti] <s PG[f, t] t2] iff ti P t2, 

NG[f,tAi] <s PG[f,t]t2] iff h -< t2, 

PA[fA] <s NG[f,t;t'], PA[f,t] <s NP[f,t;T], 
together with the corresponding cases where positive rules are replaced by neg- 
ative rules and vice versa. 

The essential element of this translation is that it formalizes that the effects 
of later events take priority over the effects of earlier events. The argumentation 
semantics of Ps{D) is given via the admissible extensions of B{D). These are 
subsets, S, of argument rules from As'c. As consisting only of generation or 
assumption rules, which are added to B{D). An extension S is admissible iff: 

~ it is consistent i.e. non- self- attacking, and 
~ (counter-) attacks any set of arguments attacking it. 

A set of argument rules. A, attacks another such set, B, if the two sets are 
in conflict, by monotonicaly deriving (in h), together with B{D), complimentary 
literals A and ^A, respectively, and A is not of lower priority than B. A set A 
is of lower priority than B if it has a rule of lower priority than some rule in B 
and does not contain any rule of higher priority than some rule in B. 

Given this translation it can be shown, under some quite general restrictions 
on D, that the models of D correspond exactly to the maximally (w.r.t. set 
inclusion) admissible extensions of Ps{D). We can then use this translation to 
develop an argumentation-based proof theory for £. This proof theory is de- 
fined in terms of derivations of trees, whose nodes are sets of arguments in As 
attacking the arguments in their parent nodes. 



(PG[f,t 2 ;hJ) 

(NGlf,t2-,h]) 



(PP[f,t 2 ;h]) 

(NPlf,t2;G]) 



iPA[f,t]) 

mim 
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Let Sq be a (non-self-attacking) set of arguments in As such that B{D) U 
^0 1“ {^)HoldsAt{F,T). Then, two kinds of derivations are defined: 

- Successful derivations So, ■■■,S, building, from a tree consisting only of the 
root ^o, a tree whose root S is an admissible subset of As such that S A So- 

- Finitely failed derivations, guaranteeing the absence of any admissible set of 
arguments containing Sq. 

Then, the given literal, L = {-S)HoldsAt{F,T), we say that L is a sceptical 
consequence of D iff (i) there exists a successful derivation starting with So and, 
(ii) for every set S'q of argument rules in As' such that B{D) U 5'q derives (in h) 
the complement of L, every derivation for Sq is finitely failed. If only the first 
condition, (i), holds we say that L is a credulous consequence of D. 

The formal details of the derivations are not needed for this paper. Infor- 
mally, both kinds of derivation incrementally consider all attacks against the 
root and, whenever this does not counter-attack one of its attacks, a new set of 
arguments that can do this is added to the root. Then, the process is repeated, 
until every attack has been counter-attacked successfully (successful derivation) 
by the extended root or until some attack cannot be counter-attacked by any 
extension of the root (finitely failed derivations). Examples of this proof theory 
will be presented in the next section. 

An important feature of the argumentation re-formulation of £ is the fact 
that this is modular with respect to the addition of new type of sentences in 
the language. This follows primarily from the fact that the translation is faith- 
ful at the level of the models of the language and so it can reflect the modular 
separation of the model definition into two parts: conditions (1-3) encapsulat- 
ing default persistence and condition 4 for extra constraints. When we add r- 
propositions we only need to extend the background definitions of Initiation 
and Termination in B{D) without changing the type of arguments in Ps{D). 
For each, L whenever C a fact Whenever{L,C), is added to B{D), and the 
definitions of Initiation and Termination are augmented with: 

Start{l, t) ^Whenever{l, c),Select{c, 11, {h, ■ ■ ■ , In}), 

Start{ll, t), HoldsAt{l 2 ,t^),. . . ,HoldsAt{l„, t_|_) 
where for a positive literal I = F, Start(l,t), is to be read as Initiation{F,t) 
and for a negative literal, I = ~^F, as Termination (F,t), and is the next 
immediate time after t. Hence every event that brings about any literal, h, of C 
while the rest of this, {h, ■ ■ ■ ,ln}, continues to hold also brings about, through 
the r-proposition, L. 

In turn, the only extension required to the proof theory is to add, for any 
r-proposition L whenever {Li, . . . ,L}f\, to the root S of any derivation a set 
of arguments, Sr, so that that B{D) U S U Sr b where 4> is the (classical) 
formula HoldsAt(L,t) <— HoldsAt{Li,t),. . . ,HoldsAt{Lk,t). Similarly, when we 
extend the language with t and p-propositions we need to extend the proof 
theory by adding to the root of a derivation a set of arguments, S± and Sp, so 
that they can derive (with B{D)) the HoldsAt literals corresponding to these 
sentences. The proof theory continues then as before but now with the extra 
attacks against Sr, St and Sp to be considered. 
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Finally, when we have open action types in the domain D the proof theory 
is extended to allow the abduction of a new set of events, H, and hence new 
generation arguments can be added to the root. Derivations are now defined in 
terms of tuples, < H,S >. In this extended proof theory it is possible for new 
attacks to be generated during the derivation due to the new events abduced. 
The proof theory therefore now includes suspended attacks which can become 
actual attacks when H grows. If this happens then these attacks need to be 
counter-attacked as usual, otherwise, suspended attacks that remain so until the 
end of the derivation are ignored. 

Theorem 1. (Soundness and Completeness of the Extended Proof Theory) 

Let he a description domain possibly with open action types and Sq a con- 
sistent set of arguments. If there exists a successful extended derivation from 
< ?), So > to < H, S > then there exists a generalized model, M , of D such 
that (i) M is also a model of DU H and (ii) M satisfies L for any literal L s.t. 
B{D U H) U S \- L. Also, if every extended derivation from < 0, So > is finitely 
failed, then there exists no generalized model, M , of D such that M satisfies L 
for every literal L s.t. B{D U H) U Sq L where H is the set of h- propositions 
corresponding to M. 

Conversely, let M be a generalized model of D such that its corresponding set 
of h-propositions H is finite and M satisfies L. Then there exists a set of argu- 
ments So and a successful extended derivation from < 0, S'o > to < H' , S > such 
that B{D UH')USo^ L and H' C H. 

4 Reasoning with the Language S 

The language £ can support in a natural way a variety of modes of reasoning with 
actions and observations. The argumentation-based computational model for E, 
described in the previous section, allows a principled implementation of these 
forms of reasoning. In this section we present some of these forms of reasoning 
and explain briefly how they are mapped into argumentation. 

4.1 Default Persistence 

The argumentation translation of the language £ maps the basic reasoning of 
default persistence captured by the model theoretic semantics of £ into an argu- 
mentation reasoning. Consider the following example where vaccine A provides 
protection only for people with blood type O, and vaccine B for people with 
blood type other than O. 



^ All results in this paper refer to domains with discrete linear time, a hnite number 
of h-propositions and a restriction that limits the possibility for events to simulta- 
neously initiate and terminate the same fluent. 
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Example 2. (Vaccinations - No open actions) 
InjectA initiates Protected when {TypeO} 
InjectB initiates Protected when {^TypeO} 
InjectA happens-at 2 



(^. 1 ) 

W) 

W) 

(^. 4 ) 



InjectB happens-at 3 



Given this domain, we can show that at any time Tf after 3 G = {Protected 
holds-at Tf} is a sceptical consequence of Dy whereas for times less or equal 
to 3 Protected is only a credulous consequence. An argument Sq for G is given 
by So = {PG[Protected,Tf,2], PA[TypeO,2]} i.e. a generation argument based 
on the event of InjectA at time 2 together with an assumption argument for 
TypeO at time 2. All attacking arguments against this can be counter-attacked 
(defended) by Sq itself. This gives a successful derivation for Sq and thus G is 
a credulous consequence. To show that it is a sceptical consequence we consider 
the opposite goal ~^G. 

The only way to derive this is through the argument i?i = {N A[Protected, 
Tf]} (there are no generation arguments for ~^G). This is attacked by Sq given 
above, which can be counterattacked (only) if Ri is extended to i ?2 with the 
assumption argument N A[TypeO, 2], But i?i and thus also i ?2 are also attacked 
by {PG[Protected, Tf; A], N A[TypeO , 3]} via the event InjectB at time 3. To de- 
fend against this it is now necessary to add PA[TypeO,3] to i ?2 to give i ?3 = 
[N A[Protected,Tf], N A[TypeO,2], PA[TypeO,3]}. But then we have a new at- 
tack against given by {NP[TypeO, 3; 2], NA[TypeO, 2]} through a persistence 
argument from time 2 to time 3. This attack can only be counterattacked via a 
generation argument for HoldsAt(TypeO, 3). But no such arguments exist in Dy 
and hence the derivation for ^G finitely fails, as required. 

This example shows how the argumentation reasoning deals correctly with 
default persistence under incomplete information. For a more complex example 
consider the same goal G in the domain below where the fluents TypeO and 
Strong are incomp letly specified. 

Example 3. (Vaccinations Cnt.) 

InjectB initiates Protected when {^TypeO} (Dyl) 

InjectC initiates Protected when {Strong} {Dy2) 

Strong viheneveic {TypeO} {Dy3) 

InjectB happens-at 2 (0^4) 

InjectC happens-at 3 (Dy5) 

As above, derivations for ^G finitely fail. The two attacks against ^G, via 
the two injection events, can only be counterattacked by {PA[TypeO,2]} and 
{N A[Strong,3]}. But then the satisfaction of the ramification statement re- 
quires {PA[Strong,2]} to be added. In turn this gives a new persistence attack 
of Strong from time 2 to 3 which can not be counterattacked as there is no 
generation rule for ^Strong. 
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4.2 Assimilating Observations and Diagnosis 

A domain description in the language £ may contain observations (t-proposi- 
tions) about some of its fluents. The observations can refer either to some initial 
time or any other time point. An argument for a default conclusion to be valid 
must also be extensible to an admissible superset that is able to confirm these 
observations. This extra requirement gives a form of reasoning from effects to 
causes both forward and backward in time. 

Example 4- Infections - No open actions 

Expose initiates Infected when {TypeA} {DA) 

Expose initiates Infected when {TypeB} {Df2) 



Allergic whenever {TypeA, Infected} {Dfi) 

AZfergic whenever { TypeB} {DA) 

Expose happens-at 3 {DA) 

-^Infected holds-at 1 {DA) 

Infected holds-at 6 {DA) 



The observation at time 6 requires that a generation rule argument for Infected 
at 6 is added to the root of any derivation. The weaker assumption argument 
PA[Infected; 6] cannot defend against its persistence attack starting from time 
1 where the observation of ^Infected is given. The only possibility for such a 
generation rule is the one based on the event of Expose at time 3 with either 
TypeA or TypeB assumed at time 3, and consequently at any other time before 
or after 3. Under any one of these assumptions the two r-propositions imply 
that Allergic would hold from 6. The argumentation reasoning is thus able to 
derive that -^Allergic cannot be derived credulously and hence that Allergic holds 
sceptically from 6 onwards. 

Effectively, these observations are explained in terms of missing information on 
incomplete fluents. When a domain contains open action types this gives us a 
form of diagnosis of the observations in terms of assumptions both on incomplete 
fluents and on unknown (in D) events. 

Definition 2. (Diagnosis in £) 

Let D he a given domain description^ and O a set of observations. A (strong) 
diagnosis for O in D is a set H of h-propositions s.t. D U H is consistent and 
DUH \=0. 

A weaker form of diagnosis useful when we have incomplete information on 
fluents whose truth cannot be affected by any action (e.g at some initial time 
point), is as follows. 

Definition 3. ( Conditional Diagnosis in £) 

Let D be a domain description and O a set of observations. Then a weak di- 
agnosis for O in D is a set H of h-propositions s.t. there exists a model M of 

^ For simplicity of presentation we will assume that the domain does not contain 
any p-propositions, P. If this is the case then we have an extra requirement on the 
diagnosis that D VJ H \= P . 
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D U H where M \= O. H is conditional on a set of assumptions A iff A is a 
set of t-propositions (An O = %) such that H is a (strong) diagnosis for O in 
Du A. The tuple, < H, A >, is called a conditional diagnosis for O in D. 

Note that the assumptions A in a conditional plan can refer to any time 
point not necessarily to an initial time point only. In the previous example the 
empty set i? = 0 is a weak diagnosis for the observation Infected holds-at 6 
in the domain given by the sentences Dil-Dffy. Two conditional diagnoses 
are < 0, TypeA holds-at 3 > and < 0, TypeB holds-at 3 >. Note that < 0, 
TypeA holds-at 1 > is also a conditional diagnosis. The assumption that TypeA 
holds at some time point (e.g. an initial time point 1) implies that it also holds 
at any other time point as no action in Di can affect the value of this fluent. 
Typically, if we have incomplete information on fluents that cannot be affected 
by any action or the information is incomplete at some initial time point before 
which actions can not occur then a conditional diagnosis is appropriate. 

Theorem 2. Let D be a given domain description and O a set of observations. 
Let also So Q As be a set of arguments and Hq a set of action facts such that 
B{D U Ho) U 5*0 b O. If there exists a successful extended derivation in D from 

< Ho, So > to < H ,S > then< H, A > is a conditional diagnosis for O in D, 
where A = {F holds-at T\PA[F,T] S 5} U holds-at T\N A[F,T] € 5"}. 

To illustrate this computation of conditional diagnoses let us consider again 
example 1 where Dff is absent and Break is an open action type. Suppose we 
are given the observations: Light holds-at 4 and ^Light holds-at 6. 

To assimilate the first observation we can use a generation argument. So = 
{PG[LightA;2], PA[Normal,2]}, based on the given event of SwitchOn. This 
can defend itself against all its attacks except possibly an attack via a generation 
argument based on an event of Break at a time after 2 and before 4. As we have 
no such event in our computed diagnosis this remains suspended. Also, because 
the p-proposition in D requires that -^Light holds at 2, So will be extended 
to S\ with N A[Light,2]. To assimilate the second observation the only way we 
can extend, S\, is via a generation argument based on an event of Break at a 
time before 6. Hence is extended to S '2 = S'! U {NG[Light,4;T]} and Ho 
to Hi = {Break happens-at T} for T < 6. Note that this generation of -^Light 
is indirect through the ramification statement in the theory. 

Adding this new event results in the re-examination of the suspended attack 
from before. In general, there are two ways to deal with this situation. One way 
is to constrain the time of the new event so that it does not lead to an actual 
attack. The other is to counter-attack this attack in the usual way. In this case, 
the second option is not available as we cannot assume SwitchOn events. Hence 
we are forced to set T > 4. The computation then concludes successfully with 

< Hi, S '2 > giving the (a set of) conditional diagnosis (one for each T in [4,6)) 

< {Break happens-at T}, {Normal holds-at 2,^Light holds-at 2} >. 

A computed conditional diagnosis < H, A > in D can be tested to see if this 
is a strong diagnosis by checking whether the assumptions A follow sceptically 
from D U H. In the previous example, {Normal holds-at 2} is not a sceptical 
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consequence (the domain is incomplete on this fluent) and hence the diagnosis 
needs this condition. 

5 The £-RES System: Implementing £ 

The argumentation based proof theory described in section 3 forms the basis 
of a principled implementation of the language £ into a system, called if-RES. 
The computational effectiveness of this system depends on two main factors: 
(a) reducing the number of attacks considered for the goal at hand by restrict- 
ing only to attacks that are necessary, and (b) improving the effeciency of the 
satisfaction of the global constraints imposed by the t,p and r-propositions. A 
major optimization that we can apply with respect to the first factor concerns 
the consideration of persistence attacks. 

Definition 4. A restricted attack against a set S is a minimal attack on S 
which does not contain any persistence rule PP[F,T';T] (resp. NP[F^T';T]) 
unless S contains the assumption rule NA[F,T] (resp. PA[F,T]) and B(D) U 
S h P[oldsAt{F,T') (resp. B{D)US \~ ~^F[oldsAt{F,T'). 

Lemma 1. Let D be a domain and S a set of argument rules that is consistent 
and attacks all the restricted attacks against it. Then there exists a superset of S 
which is admissible. 

This means that we only need to consider those persistence attacks against a 
set S that start from assumptions that are in S. In the implementation of f-RES 
we exploit this lemma by considering a notion of suspended persistence attacks 
on an assumption which are activated whenever the contrary assumption (at 
another time point) is added to S. 

5.1 Satisfiability of Constraints in £-RES 

The global constraints imposed by the t,p and r-propositions can be compu- 
tationally demanding. Although most of these constraints refer to a single time 
point, those imposed by the ramification statements need to be satisfied at every 
time point and hence could be a major source of inefficiency. The lemma below 
allows us to address this by confining this task to a specific set of time points. 

Lemma 2. Let D be a domain and Ti,T2 (Ti < T2) be time points such that 
there is no h-proposition in D at any time T in (Ti,T2). Suppose also that 
there exists a partial model Mp of D defined over the whole time line minus the 
interval (Ti,T2\, except at times points in (Ti,T2] where t-propositions are given 
in D where Mp satisfies the conditions imposed by these. Mp also satisfies any 
p-propositions at T2. Then if there exists a time point T in (Ti,T2] such that Mp 
can be extended to a partial model covering also T then Mp can be extended to 
a full model of D. 
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Hence when D contains only a finite number of h-propositions we can split 
the (linear) time line to a finite number of time intervals and satisfy the ram- 
ification constraints only at one time point in each of these intervals. f-RES 
implements an interleaved process of (a) satisfiability of the ramification state- 
ments as classical implications at these time points and (b) cross-check of the 
assumptions required in (a) under the language £ default persistence. 

As the number of (ground) ramification constraints at each time point can 
be large we can employ a SAT solver [6] within the f-RES system to carry out 
this process (a) of generating a classical model for these. Furthermore, we have 
considered a notion of relevancy of ramifications to the query at hand which, 
assuming that D is consistent, selects at each time point only a subset of rami- 
fication constraints. Therefore we now have an iterative (over the finite number 
of time points) process of interleaving between: (i) projecting the assumptions, 
that we have added to S so far, to the current time point and selecting the rel- 
evant ramification constraints based on these, (ii) generating a classical model 
of these constraints using a SAT solver given the partial instatiation generated 
in (i), and (iii) ensuring the compatibility of this model with the arguments S 
computed so far at the previous time points. Note that the output of steps (ii) 
and (iii) could affect the set of relevant ramifications computed in (i) and hence 
we need to repeat the whole process before going to the next time point. 

Initial experiments with this iterative method indicate a significant reduction 
in the computation. Note however that we are still left with the problem of 
deciding which t-propositions are relevant to the query/goal at hand. Currently, 
we assume that these are selected externally to the system. 

The f-RES system is currently implemented in Prolog (Eclipse 4.2). An inter- 
face allows the user to define directly in the syntax of the language £ the domain 
description. The system also supports some extra forms of auxiliary information, 
e.g. that a fluent is constant and so does not change over time. Open action types 
are specified together with their associated p-propositions and priority informa- 
tion amongst them that might exist. In addition, although £ is defined as a propo- 
sitional language the f-RES system allows domain descriptions to be given in a 
non-propositional form under some restrictions. An early version of the system 
with examples is available from http://www.ucl.ac.uk/~uczcrsm/LanguageE/. 
New versions of the system will be added to this web site in the near future. 

6 Related Work and Conclusions 

Recently there has been a wide interest in developing specialized action lan- 
guages [5] . These efforts have concentrated on the formal semantics of such lan- 
guages and how they can be applied to specific problems. Examples of these are 
the language Golog [12] or the Fluent Calculus as developed in [15] for cognitive 
robotics, a circumscriptive Event Calculus [14] and the language C [8] for plan- 
ning, and the language C together with others related to it [1,4] for the problem 
of diagnosis. Our work focuses on the general computational aspects of such 
languages using argumentation and abduction as a basis for a principled imple- 
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mentation of the language £. A system, called Causal Calculator [113], which is 
based on the language C translates the whole representation into a propositional 
theory and then uses a SAT solver to find a solution to its query. A systematic 
comparison of the f-RES system with these systems would be useful. 

An interesting feature of our approach is the possibility it opens of synthesiz- 
ing, in the implementation of these languages, the resolution based computation 
of argumentation and abduction in Logic Programming with the propositional 
satisfiability methods of SAT solvers. A SAT-based procedure has also been used 
recently in C [7] for planning. This hybrid computational model, that could also 
include other methods e.g. constraint solving, is an important topic of future 
work. Currently, the system is designed with emphasis on the complexity of 
reasoning that it can perform rather than on the efficiency of large scale com- 
putation. We are studying ways to improve this by investigating further notions 
of relevancy in order to dynamically focus the computation only on the parts of 
the theory, especially of t-propositions, that are needed for the query at hand. 
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Abstract. We define a new syntactic class of logic programs, omega- 
restricted programs. We divide the predicate symbols of a logic program 
into two parts: domain and non-domain predicates, where the domain 
predicates are defined by the maximal stratifiable subset of the rules of 
the program. We extend the usual definition of stratification by adding 
a special omega-stratum that holds all unstratifiable predicates of the 
program. We demand that all variables that occur in a rule also occur 
in the rule body in a positive literal that is on a lower stratum than rule 
head. This restriction is syntactic and can be checked efficiently. The 
existence of a stable model of an omega-restricted program is decidable 
even when function symbols are allowed. We prove that the problem is 2- 
NEXP-complete and identify subclasses of omega-restricted programs 
such that the problem stays in NEXP or NP. The class of omega- 
restricted programs is implemented in the Smodels system. 



1 Introduction 

The answer set programming (ASP) paradigm has gained popularity in the re- 
cent years as a number of ASP systems have become available (for example, 
DeReS [3], dlv [6], and Smodels [12]). The basic idea of ASP is to encode a 
problem as a logic program such that the answer sets (stable models) of the pro- 
gram correspond to the solutions of the problem. We then use a logic program 
engine to find the answer sets of the program. The underlying formal semantics 
is usually based on some extension of the stable model semantics of normal logic 
programs [7]. 

The inference engines of the existing systems work with ground programs, 
that is, programs without variables. A rule with variables represents the set of 
ground rules that can be created by replacing the variables in it by constant 
terms that occur in the program. This instantiation is done in a preprocessing 
step before the actual inference engine is used. This bottom-up approach to 
variable use has prevented the use of function symbols since even one function 
symbol in a program forces its Herbrand instantiation to be infinite. However, 
in most cases it is enough to examine only a small subset of the Herbrand 
instantiation since vast majority of the rules will have unsatisfiable bodies so 
they can be left out without affecting the set of stable models. This holds true 
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even when function symbols are allowed; it is possible that all answer sets of a 
program are finite and computable even if the Herbrand instantiation is infinite. 

The aim of this work is to define a new class of logic programs, w-restricted 
programs, that are syntactically guaranteed to be decidable even when function 
symbols are used. The basic idea is to construct a hierarchy of predicates such 
that the predicates on the lowest level are defined using only ground facts and 
all variables that occur in a rule of level n + 1 have to also occur in a positive 
literal of level n or lower in the rule body. The definition of the hierarchy extends 
the usual concept of stratification [1] by adding a special o;-stratum to hold the 
unstratifiable part of a program. 

It turns out that this syntactic restriction is strong enough to guarantee finite 
answer sets. In fact, we will see that deciding whether an w-restricted program 
has an answer set is 2-NEXP-complete. Since the stable model semantics of 
logic programs without functions is NEXP-complete [4], we can conclude that 
by using w-restricted functions we move up one step in the exponential hierarchy. 
This result also implies that we cannot solve all computable problems using oj- 
restricted programs. Recently P. Bonatti [2] has proposed a computationally 
complete class of logic programs called finitary programs. However, together 
with Turing equivalence comes semi-decidability of general reasoning problems. 

The ^-restricted programs have been implemented in the Smodels sys- 
tem [12] that has been designed in Helsinki University of Technology. The Smod- 
els system is available at http://www.tcs.hut.fi/Software/smodels. 

In the following sections we will use the following program to illustrate the 
basic concepts of w-restriction: 



number{0) 
odd{x + 1) 
even{x + 1) 
even(O) 
two-divides(x) 
interesting(x) 
dull(x) 
interesting -odd{x) 



; • • • ; number(n) <— 
number (x), even{x) 
number{x) , odd{x) 

even(x) 

number(x), not dull{x) 
number (x), not interesting(x) 
odd{x), inter esting{x) . 



( 1 ) 



2 The Stable Model Semantics 

The basic component of a logic program is an atom of the form: 

p(U,...,t„) (2) 

where p is a n-ary predicate symbol (n > 0) and ti, . . ., are terms. A term 
is either a variable v, a constant c, or an m-ary function symbol /(ti, . . . ,tm) 
where U, . . ., tm are terms. We denote the predicate symbol of an atom A by 
pred{A). A literal is either an atom A or its negation not A. 
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The set V{t) of variables that occur in a term is defined as follows: 

0 , if t is a constant; 

V(t) = {t} , if t is a variable; and 

1 ^(ii) > if t is a function f{ti, . . . ,tm) ■ 

A variable occurs in a literal if it occurs in at least one of its arguments: 



(3) 



An inference rule R is of the form: 



(4) 



2=1 



h < li j j Ifi , (5) 

where the head h is an atom and In in the body are literals. The sets 

of positive and negative literals in the body of R are denoted by body^{R) and 
body~{R) respectively. Intuitively, a rule asserts that if all literals in the rule 
body are true, then the head must be true also. A logic program P is a finite set 
of rules. We denote the set of predicate symbols that occur in P with preds{P). 

The set of variables that occur in a rule R is defined in terms of variables 
that occur in its literals: 

V(R) = V{head{R))U [J V{1). (6) 

body{R) 

A rule is ground if V(i?) = 0. 

Let P be a variable-free logic program and M be a set of atoms that occur 
in P. Then, the Gelfond-Lifschitz reduct P^ is obtained by: 

1. removing each rule with a negative literal not A in its body where A € M. 

2. removing all negative literals form the bodies of the remaining rules. 

Since P^ is negation-free, it has a unique least model M' . If the model M' 
coincides with M, then M is a stable model of P. 

The Herbrand universe HU(P) of a logic program P is the set of constant 
terms that can be formed using the constants and function symbols of P. A 
ground instance of a literal or a rule can be obtained by replacing variables in 
it by terms in HU(P). The Herbrand instantiation Pq is the set of all possible 
instantiations of rules in P. The set of stable models of a logic program P 
with variables is defined to be the set of stable models of its instantiation Pq- In 
practice, we usually do not have to construct the full Herbrand instantiation to be 
able to construct all stable models. Hereafter we will use the term instantiation 
of P to mean any subset of Pq that has the same set of stable models. 

Example 1. Let P be the program: 

a(l) ^ ; a(2) ^ 

b{x) <— a(a:),not c{x) 
c{x) <— a(a:),not b{x) . 



( 7 ) 
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Fig. 1. The dependency graph of Program (1) 



Then, the instantiation Pq is: 

a(l) <— a(2) <— 

6(1) ^ a(l),not c(l) 6(2) <— a(2),not c(2) (8) 

c(l) <— a(l), not 6(1) c(2) ^ a(2), not 6(2) . 

Now, Po has four stable models: Mi = {a(l), a(2), 6(1), 6(2)}, M 2 = |a(l), a(2), 
6(1), c(2)}, M3 = {a(l),a(2),c(l),6(2)}, and M4 = ja(l), a(2), c(l), c(2)j. Con- 
sider Ml. The reduct Pq^ is the program: 

6(1)^; 6(2)^; a(l) a(2) ^ . (9) 

The least model of Pq^ = {a(l), o(2), 6(1), 6(2)} = Mi, so we see that Mi is 
really a stable model of Pq and hence of P. 

A spitting set of a normal logic program P is a set of ground atoms U such 
that for all rules R in Pq if head{R) is in U, then all atoms occurring in the 
rule body are also in U. We denote the set of ground rules whose heads are in U 
with buiPc)- Given an an evaluation / of atoms in U, we denote by eu^Pc,!) 
the set of ground rules that is obtained by removing from Pq all rules that have 
a literal I containing an atom of U that is not satisfied by / and removing all 
literals I containing a member of U from the bodies of the remaining rules. By 
Splitting Set Theorem [8] , M is a stable model of Pq only if M = lU J where I 
is a stable model of bu[PQ) and J is a stable model of eu{Pa \ bu{P), I). 

3 Omega-Restricted Logic Programs 

In this section we give a formal definition for w-restricted programs. The main 
idea is to construct a stratification of the predicate symbols such that a pred- 
icate p is on a higher level than a predicate g if p is defined in terms of q. We 
start by formalizing the concept of dependency between predicate symbols. 

Definition 1. Let P be a logic program. Then, the one-step dependency relation 
Di{P) C preds{P) x preds{P) is defined as follows: 
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1. Di[P) = {{pred{a) , pred{l)) \ 3R G P \ a = head{R) A I € bodp'^(R)} 

2,. D^{P) = {{pred{a),pred{l)) \ 3R G P : a = head{R) A I S body~{R)} 

3. Di{P)=D+{P)\JD^{P). 

The one-step dependency relation may be drawn as a graph. For example, the 
dependency graph of Program (1) is shown in Figure 1. 

We now generalize the one-step dependency relation to a full dependency 
relation. The intuition is that a predicate p depends on a predicate q if there is a 
path from p to g in the dependency graph. If at least one of the edges between p 
and q is negative, then p depends negatively on q. 

Definition 2. A dependency path irp of a logic program P is a sequence 

TTP = (Pl,P2,...,Pn) (10) 

where pi G preds{P) for 1 < i < n and (pj,pj+i) G Di(P) for 1 < j < n. A 
path TTp is negative (denoted by np) if and only if (pj^pj+i) € Df(P) for some 
1 < j < n. 

Definition 3. The dependency relation D{P) C preds{P) x preds{P) of a logic 
program P is defined as follows: 

1. D+{P) = {{p,q) I 3ttp : ttp = ( p , ..., g )}; 

2. D~{P) = {{p,q) I 37fp ■.Wp = (p,.. .,q)}; and 

3. D{P) = D+{P)Li D~{P). 

Next, we define the concept of w-stratification. The definition extends the 
traditional definition of stratification [1] by adding a new stratum, the w-stratum, 
for the predicates that depend negatively on each other. 

Definition 4. An w-stratification of a program P is a function S : preds{P) — > 
N U {cj} such that: 

1. VpiVp 2 ((pi,P 2 ) € D+{P) =5> S{pi) > S{P 2 )); and 

2. VpiVp2((pi,P2) e D~{P) ^ 5(pi) > 5 (p 2) V 5(pi) = w) . 

We use the convention that uj > n for all n G N. The first condition asserts that 

a predicate pi that depends positively on a predicate p 2 has to be on at least as 

high stratum as p 2 . The second condition states that if pi depends negatively 
on p 2 , then pi has to be on a higher stratum or they both must be in the uj- 
stratum. In practice, we are interested in stratifications that are strict in the 
sense that 5(pi) > S{p 2 ) whenever pi depends on p 2 but not vice versa. 

Example 2. Consider Program (1). We can construct an w-stratification S for it 
by looking at its dependency graph. As there are no edges leading from number, 
we can set S (number) = 0. Predicates even and odd depend on number and each 
other, so we set S(even) = S(odd) = 1. Continuing this, two-divides depends 
on even so S(two-divides) = 2. The negative cycle of interesting and odd forces 
that S (interesting) = S(odd) = S (interesting -odd) = lu. This stratification is 
shown in Figure 2. 
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number 




interesting_odd 
Fig. 2. A stratification of Program (1) 



Next, we will extend the w-stratification to cover also rules and variables by 
defining the concept of an w- valuation. 

Definition 5. The w-valuation of a rule R under an uj-stratifieation S is the 
function: 

fi{R,S) = S{pred{head{R))) 

The w-valuation of a variable v in a rule R under an uj- stratification S is the 
function: 

n{v, R,S) = min({5(pred(a)) | a G body^(R) A u € V(a)} U {w}) 
Example 3. Let S be as defined in Example 2. Consider the rule 
R \ interesting{x) ^ number{x), not dull{x) . 

Now 

17(A, 5) = S {interesting) = uj 
[2{x,R,S)= min{iS(num6er), 5(duZ/), w} = min{0, w} = 0 . 

A rule is cj-restricted if all variables that occur in it also occur in a positive 
body literal that belongs to a strictly lower stratum than the head. 

Definition 6. Let R be a rule in a logic program P. Then R is w-restricted if 
and only if there exists an lu- stratification S such that: 

Vu G V{R) : f2{v,R,S) < f2{R,S) . 

Definition 7. Let P be an logic program. Then P is w-restricted if and only if 
all rules R G P are to -restricted. 
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Example 4- Consider the rule: 



s{x -I- 1) ^ s(a:) 

This rule is not w-restricted since for all u;-stratifications, C(i?, S) = f2{x, R, S). 

Finally, we divide the predicate symbols into two classes, domain predicates 
that are on finite strata and non-domain predicates that are on the w-stratum. 
The domain predicates are defined by the maximal stratifiable subset of the rules 
of the program. 

Definition 8. Let P be an uj -restricted program. Then a predicate p € preds{P) 
is a domain predicate if and only if there exists an co- stratification S such that 
S{p) < w. The set of rules defining domain predicates of P is denoted by T){P). 

4 Domain Predicates and Instantiation 

The subprogram 'D(P) defining the domain predicates of P is stratified, so it 
has a unique least model Mx>(p). It is easy to verify that is a splitting set 

of P. Thus, we can compute the stable models of P by first computing Mps^p) 
and then extending it to cover the atoms on w-stratum. Moreover, each variable 
that occurs in a rule R occurs also in a positive domain literal so we can create 
all relevant ground instances of R by computing the natural join of extensions 
of the domain literals in body{R). 

We can compute Mpt^p) and the instantiation Pnq of Pn = P\ P{P) using 
the following algorithm: 

1. Find all strongly connected components of the dependency graph of P. Each 
component becomes a new stratum with the exception that all components 
that have a path to a negative dependency cycle are put on the w-stratum. 
Order the different strata by doing a depth-first search over the strongly 
connected components. 

2. Instantiate the predicates on finite strata starting from the lowest one. After 
instantiation, compute the deductive closure of the new ground rules and 
store the resulting atoms as facts in a database. These facts are then used 
to give domains for variables when we instantiate the rules on higher strata. 

3. Finally, instantiate all rules on the w-stratum and output them along with 
the domain facts. 

5 Computational Complexity 

In this section we examine the computational complexity of o;-restricted pro- 
grams. We are interested in two problems: 

— In INSTANTIATION we have an w-restricted program P and a ground atom 
p(ti, . . . ,t„) and we want to find whether one of the following conditions 
holds: 
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Table 1. Computational complexity 







Instantiation 


Model 


No variables 




— 


NP-complete 


Fixed variables 


No functions 
With functions 


P-complete 

EXP-complete 


NP-complete 

NEXP-complete 


Unlimited variables 


No functions 
With functions 


EXP-complete 

2-EXP-complete 


NEXP-complete 

2-NEXP-complete 



1. p{ti,. . . ,t„) e M - P ( P ); or 

2. There is a rule p(ti, ■ ■ ■ , tn) ^ h, . . . ,ln in Png- 

— In MODEL we want to find out whether an w-restricted program P has an 
answer set. 

The instantiation complexity is included in the model complexity in all cases 
since we may have to construct the full instantiation of a program before we 
know whether it has any stable models at all. 

In addition to proving complexity results for the whole class of w-restricted 
programs, we examine how the computational complexities of instantiation 
and MODEL change when we restrict our attention to some subclasses of pro- 
grams. We use two parameters to divide the w-restricted programs into four 
classes: 

~ The maximum number of variables in a rule is either fixed to some constant d 
or it is unlimited; and 

— Function symbols are either allowed or not. 

The main complexity results are presented in Table 1. The model complexi- 
ties of function-free normal logic programs with the stable model semantics have 
been presented in earlier literature [10,4]. The corresponding complexity classes 
of function-free w-restricted programs are the same so we see that at least in 
these categories w-restricted programs are as expressive as normal logic pro- 
grams. Since the model problem of the unrestricted case is 2-NEXP-complete, 
we know that w-restricted programs are decidable: 

Theorem 1. Both instantiation and model are decidable for lo - restricted 
programs. 



5.1 Turing Machine Translation 

Most of the complexity results of this work are derived by proving that the 
computations of a deterministic Turing machine M can be simulated by a logic 
program P such that the size of P is polynomial with respect to the size of M. 

Definition 9. A deterministic Turing machine M = (if, S, 6, s) where K is a 
finite set of states, S is a finite alphabet containing the blank symbol Li, s G K 
is the initial state and 6 is a transition function 5 : if x 27 — > (if U {y, n}) x 27 x 
{- 1 , 0 , 1 }. 
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A computation of a Turing machine M given an input x starts from the 
configuration (s,Ua;) and each computation step yields a new configuration ac- 
cording to 5 until one of the halting states y (accept) or n (reject) is reached. 

We encode the states of a Turing machine M using the predicate state(q), the 
alphabet using symbol(a), and the transitions using transition (qi, a i,q 2 ,(J 2 ,d), 
where d G {—1,0, -1-1}. The atom at-place(a,p,t) is used to denote that the 
input tape cell p contains the symbol a at the time step t. The predicate 
current- state (q^ p, cr, t) indicates that the machine is in the state q and the head 
is over the tape cell p looking at the symbol cr at the time step t. 

We encode one computation step using the two rules: 

at-place(s2,P,t 1 ) <— transition(qi, Si,q2, S2,d), 

current-state(qi,p, Si,t), (11) 

place (p), time(t) 

current-state(q 2 ,p d,S 3 ,t 1 ) <— transition(qi, S\,q 2 , S 2 ,d), 

current-state(qi,p, si,t), (12) 

at-place(s 3 ,p d, t), time(t), 
place (p), symbol (s 3 ) . 

Here we have used the notation t -|- 1 to denote the successor of t. How the 
successor relation is actually defined depends on the program class that we want 
to examine. The same thing holds also for predecessor relation that is used in 
the case of p — 1. 

The rules above handle the cell where the read/ write-head is currently posi- 
tioned. In addition, we have to assert that the state of the other tape cells stays 
constant: 

at-place(si,pi,t 1) <— current-state(q,p 2 , S 2 ,t), 
at-place(si,pi,t),time(t), 
symbol(si), symbol(s 2 ), state(q), 
place(pi), place(p 2 ),not equal(pi,p 2 ) ■ 

In the initial configuration all tape cells that are not part of the input are empty: 



at-place(U, p, 1) ^ place(p), not part-of-input(p) . (14) 

The first \x\ tape cells are initialized from the input and they also belong to the 
extension of part-of-input/1. Finally, we want to recognize whether the Turing 
machine halts in an accepting state or not: 



accept ^ current- state (y, p, s,t), place (p), symbol (s), time (t) 
rejects current-state(n,p, s,t),place(p), symbol(s),time(t) . 



(15) 



Note that we have not yet given definitions for the predicates timejl and 
place /I that encode the time steps and tape cells. In the following complexity 
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proofs we show how we can define them in a polynomial number of rules using 
tools that are available for the four different w-restricted program classes. 

Since all rules except (13) and (14) in the translation are negation-free and 
both negations are over a predicate with a fixed extension that is linear to the 
size of the input program, the least model of the instantiation can be found in a 
linear time with respect to its size [5]. All predicates are domain predicates and 
we can easily find whether accept or reject is true in Mx)(p)- 

We can generalize the translation to allow non-deterministic Turing machines 
by forcing the machine to choose between possible transitions at all computation 
steps. Due to space constraints, we do not include the details here but one 
possible translation has been presented by V. W. Marek and J. B. Remmel [9]. 
The existence of such a translation is enough to prove the following lemma: 

Lemma 1. If the instantiation problem of a subclass of the uj -restricted pro- 
grams is C-complete for some complexity class C, the corresponding model 
problem is ISS C-complete. 

5.2 Complexity Results 

Theorem 2. The instantiation of an u -restricted program is P-complete 
when the number d of variables occurring in it is fixed (d > S) and no func- 
tion symbols are allowed. 

Proof. We construct the proof in two parts: 

(a) Inclusion. Let P be a program with d distinct variables. Then, each rule 
has at most n'^ ground instances, where n is the number of constants in the 
program. 

(b) Hardness. The P-complete problem Boolean circuit value [11] can be 
expressed as an (^-restricted logic program as follows: 

true{G) ^ nand-gate{G, L, R), false(L) 
true{G) <— nand-gate{G, L, R), false(R) (16) 

false{G) ^ nand-gate{G, L, R),true{L),true{R) . 

Here we suppose that the Boolean circuit is implemented using only not-and 
gates and that the truth values of the input gates are given as facts. 

Corollary 1. The model problem for a fixed number d of variables and no 
function symbols is NP -complete, if d>5. 

Theorem 3. The instantiation of an unlimited-variable uj -restricted program 
is EiKP -complete if no function symbols are allowed. 

Proof. For inclusion, see Dantsin et.al. [4]. The hardness can be proved by noting 

k 

that a deterministic EXP-time Turing machine M uses at most 2" time steps 
for some k when the length of the input is n. We have to show that we can 
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generate an exponential number of atoms representing time steps and tape cells 
using a program whose size is polynomial with respect to the size of M . To do 
this, we need to implement a n*^-bit binary counter that runs from 0 to 2" — 1. 
This can be done by encoding the numbers as vectors of binary variables: 

number{0, . . . , 0) <— 

number{yi , . . . ^ bit{yi), bit{yr,k), 

number{xi , . . • , x^k ) , 

Ticxt(^x\ , . . . , x^k , y\ , . . . , y^ik ) . 



The predicate bit/1 is an auxiliary with the extension {bit{0), bit{l)} that is used 
to ensure that the rule is w-restricted. The successor relation can be encoded with 
the rule: 



next{xi, . . . ,Xnk,yi, ■ . . ,ynk) ^ add{xi,l,yi,ci), 

add {x 2 , Cl, y 2 ,C 2 ), 



(18) 



add {Xjik , Cjik _ I , y„ik , c,^k ) 

where add /4 is defined using the following four facts: 

add(0, 1, 1, 0) ^ add(0, 0, 0, 0) ^ 
ad(i(l,0,l,0) ^ add(l,l,0,l) ^ . 



(19) 



We can implement the predecessor function by switching the arguments of the 
next predicate. Now the time steps and tape positions can be defined in terms 
of numbers: 

time{xi, . . . , Xnk) <— number{xi, . . . , x„fe) 
place (xi, ) <— number (x I , . . . , ) . 



Finally, we replace all references to time /I and place /I by time/n^ and place /n^ 
and add all necessary domain predicates to the rule bodies. 



Theorem 4. The instantiation of a fixed-variable uj -restricted program that 
uses function symbols is EXP -complete, if d>8. 



Proof. 

(a) Inclusion. Without a loss of generality we may assume that there are k strata 
with c rules each in P. Let us use a„ to denote the number of ground instances 
of rules that belong to the first n strata. 

Since the number d of variables is fixed, a rule on the n -\- 1-stratum may 
have at most ground instances. Now we can establish an upper bound for 
the number of ground instances of rules on the stratum n -I- 1 or lower: 

a„_|_i = c • -I- a„ < c • -I- c • (when d > 1) 



= 2c • 

2*^ “1“^^ H ~\~d -\ 



= c)fi^ + (log2 C+l)tZ^ ^■■■ + (log2 C+1) 
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As both c and d are linear with respect to the size of the program, a„ grows 

2k 

0(2" ) so the problem is in EXP. 

(b) Hardness. As in the proof of Theorem 3, we need only to construct a binary 

k 

counter from 0 up to 2" — 1. We do this by encoding an m-bit binary 
number a; as a function bi{b 2 {- ■ ■ &m(0) ■ ■ ■ )) where bi is / if the ith bit of x 
is 0 and t if it is 1. The m-bit binary numbers can be generated recursively 
from m — 1-bit numbers by the following two rules: 

number m{t{x)) <— number^-iix) 
numberm{f{x)) <— numberm-i{x) 

Here we need m -|- 1 different number predicates since otherwise the rules 
would not be w-restricted. As the basic basic case of the recursion, we define 
one 0-bit number as: 

numbero{0) ^ . (23) 

The successor relation can also be defined recursively: 

nextm{t{x),t{y)) ^ nextm-i{x,y) 
nextm{f{x),f{y)) ^ nextm-i{x,y) (24) 

nextm{f{x),t{y)) ^ lastm-i{x),first^_j^{y) 

where lastm/^ and first^/1 are defined as: 



lastmif^iO)) 



The translation uses 7n^-|-3 rules to create all n^-bit numbers so we now have 
a polynomial reduction from EXP-time Turing machines to w-restricted 
programs using only function symbols and the proof is completed. 

Corollary 2. The model problem of a fixed-variable uj -restricted program that 
uses function symbols is ISSEXP -complete, if d>8. 

Theorem 5. The instantiation of an ui -restricted program is 2-EXP-comp- 
lete. 



Proof. We can combine the proofs of Theorems 3 and 4 to see that the problem 

k 

is in 2-EXP and that it is possible to implement all 2" -bit integers putting 
together the two different exponential constructions. 

Corollary 3. The model problem of an to -restricted program is 2-NEXP- 
complete. 
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6 Conclusions 

We defined a new class of logic programs, w-restricted programs, that are decid- 
able even when function symbols are used. We showed that the computational 
complexity of the program class is 2-NEXP-complete. If we make further re- 
strictions either by fixing the maximum number of variables that may occur in 
a rule or by disallowing the function symbols, the complexity drops to NEXP- 
complete. If both restrictions are in effect, the complexity stays NP-complete. 
We have implemented the w-restricted programs in the Smodels system. 
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Abstract. Most Answer Set Programming (ASP) systems, including 
DLV and Smodels, are endowed with an instantiation module. The in- 
stantiator generates a new program which is equivalent to the input 
program, but does not contain any variables (i.e., it is ground). Normal 
(i.e., disjunction-free) stratified programs are completely solved by the 
instantiator, which generates the output model directly. 

The instantiation process may be computationally expensive in some 
cases, and the instantiator is crucial for the efficiency of the entire ASP 
system. In this paper, we propose to employ join-ordering techniques to 
improve the instantiation process. We design a new join-ordering method, 
and adapt a classical database method to this context. We implement 
these techniques in the ASP system DLV, and we carry out an exper- 
imentation activity on a collection of benchmark problems taken from 
different domains. The results of experiments are very positive, the new 
techniques improve sensibly the efficiency of the DLV system, whose in- 
stantiation module confirms to be a main strong point of DLV w.r.t. the 
other ASP systems. 



1 Introduction 

The recent implementation of knowledge base systems which efficiently support 
expressive logic-based languages, like DLV [5], Smodels [14], DCS [I], XSB [17] 

, QUIP [2], and CCALC [13], has renewed the interest in the area of non- 
monotonic reasoning and declarative logic programming. The advances made in 
this area allow us to use ASP systems, like DLV and Smodels, for solving real- 
world problems in a number of application areas, including planning, scheduling 
as well as for complex data manipulations [3] [19]. For instance, Smodels is being 
used for the automatic configuration of software distributions; while the latest 
application of DLV, issued by the italian national statistics institute (ISTAT), 
concerns the automatic correction of census data. 

These systems support a fully declarative programming style, called Answer 
Set Programming (ASP). The knowledge representation language of ASP is very 
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expressive: function- free logic programs where nonmonotonic negation may occur 
in the bodies of the rules, and possibly (i.e., for some systems) with classical 
negation and disjunction in the heads of the rules. The semantics of an ASP 
program P is given by its answer sets [10], which are subset-minimal models 
of P, and are “grounded” in a precise sense. The idea of answer set programming 
is to represent a given computational problem by an ASP program whose answer 
sets correspond to solutions, and then use an answer set solver to find such a 
solution [12]. 

As an example, consider the well-known problem of 3-colorability, which is 
the assignment of three colors to the nodes of a graph in such a way that adjacent 
nodes have different colors. This problem is known to be NP-complete. Suppose 
that the nodes and the arcs are represented by a set F of facts with predicates 
node (unary) and arc (binary), respectively {node and arc can be stored in the 
tables representing the input database). Then, the following ASP program allows 
us to determine the admissible ways of coloring the given graph. 

ri : color{X, r) V color{X, y) V color{X, g) <— node{X) 

T2 : ^ arc{X, Y), color {X, C), color {Y, C) 

Rule ri above states that every node of the graph is colored red or yellow or 
green, while T 2 forbids the assignment of the same color to any adjacent nodes. 
The minimality of answer sets guarantees that every node is assigned only one 
color. Thus, there is a one-to-one correspondence between the solutions of the 
3-coloring problem and the answer sets of P U {ri, r2}. The graph is 3-colorable 
if and only if P U {ri, r2} has some answer set. 

ASP is very expressive: every problem in the complexity class (i.e., in 
NP'^^) can be directly encoded in an ASP program which can then be used 
to solve all problem instances in a uniform way [4]. The high expressiveness 
of answer set programming comes at the price of a high computational cost in 
the worst case. Indeed, computing an answer set of a disjunctive (resp. normal) 
propositional ASP program is P|^-hard (resp., NP-hard). The design and the 
implementation of suitable optimization techniques is therefore fundamental for 
ASP systems. 

The kernel modules of the ASP systems operate on a ground instantiation 
of the input program, i.e., a program that does not contain any variables, but 
is (semantically) equivalent to the original input [5]. Therefore, an efficient in- 
stantiation procedure is of utmost importance.^ The efficiency of an instantiation 
procedure can be measured in terms of the size of its output and the time needed 
to generate this instantiation. In a previous work, the DLV team has presented 
some rewriting techniques which reduce the size of the generated grounding in- 
stantiation [6]. In this paper we optimize the execution time needed to generate 
the grounding instantiation. The main contribution of the paper is the follow- 
ing: 

- We propose the use of join-ordering techniques to improve the efficiency of 
the instantiation procedures of ASP systems. In particular, a join-optimization 

^ Note that the disjunction-free stratified programs are “solved” by the instantiation 
procedure, which provides the answer and does not generate any instantiation in this 
case. Thus, the instantiator alone has the full power of a deductive database system. 
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technique can be employed to re-order the body literals of a rule during the 
instantiation process. 

- We design a new join-ordering method, and adapt a classical database method 
to our context. 

- We implement the above join-ordering methods in the ASP system DLV. 

- To check the impact of our methods on the instantiator of DLV, we experi- 
mentally compare the techniques that we implemented. 

- To assess the validity of our results more in general, we compare also the in- 
stantiator of DLV, resulting from our enhancements, to the newest version of 
the instantiator of Smodels, released on March 2001. 

The results of the experiments are very positive, it seems that the new tech- 
niques improve sensibly the efficiency of the DLV instantiator, which compares 
favourably against the instantiator of Smodels. 

2 The Instantiation Procednre of DLV 

In this section, we provide a short description of the overall instantiation module 
of the DLV system, and focus on the “heart” procedure of this module which 
produces all ground instances of a given rule, which will be optimized in the next 
sections through the introduction of the join-ordering methods. We assume that 
the reader is familiar with ASP syntax and semantics. An extensive description 
can be found in [10] and [-3]. 




Fig. 1. Architecture of DLV’s Instantiator 



The aim of the instantiator is mainly twofold: (i) to evaluate (V-free) stratified 
programs components, and (ii) to generate the instantiation of disjunctive or 
unstratified components (if the input program is disjunctive or unstratified) . 

In order to evaluate efficiently stratified programs (components), DLV uses 
an improved version of the generalized semi-naive technique [20] implemented 
for the evaluation of linear and non-linear recursive rules. 
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If the input program is normal (i.e., V-free) and stratified, the instantiator 
evaluates completely the program and no further module is employed after the 
grounding; the program has a single answer set, namely the set of the facts and 
the atoms derived by the instantiation procedure. If the input program is disjunc- 
tive or unstratified, the instantiation procedure cannot evaluate completely the 
program. However, the optimization techniques mentioned above are useful to 
generate efficiently the instantiation of the non-monotonic part of the program. 
Two aspects are crucial for the instantiation: 

(a) the number of generated ground rules, 

(b) the time needed to generate such an instantiation. 

The size of the generated instantiation is important because it strongly influences 
the computation time of the other modules of the system. A slower instantiation 
procedure generating a smaller grounding may be preferable to a faster one 
generating a large grounding. However, the time needed by the former can not 
be ignored otherwise we could not really have a computation time gain. 

The main reason of large groundings even for small input programs is that 
each atom of a rule in V may be instantiated to many atoms in B-p , which leads 
to combinatorial explosion. However, most of these atoms may not be derivable 
whatsoever, and hence such instantiations do not render applicable rules. The 
instantiator module generates ground instances of rules containing only atoms 
which can possibly be derived from V . 

In Figure 1 we have depicted the general structure of the instantiator mod- 
ule. An input program V is first analyzed from the parser, which also builds the 
extensional database from the facts in the program, and encodes the rules in 
the intensional database in a suitable way. Then, a rewriting procedure (see [6]), 
optimizes the rules in order to get an equivalent program V' that can be in- 
stantiated more efficiently and that can lead to a smaller ground program. The 
dependency graph (DG) builder computes the dependency graph of V' , its con- 
nected components, and a topological ordering of these components. Finally, V' 
is instantiated one component at a time, starting from the lowest components in 
the topological ordering, i.e., those components that depend on no other com- 
ponent according to the dependency graph. 

For space reasons we omit a detailed description of the whole instantiation 
algorithm here. The interested reader can find the instantiation algorithm in the 
technical report [7]. Below, we describe the process of rule’s instantiation ~ the 
“heart” of the instantiation module ~ which we optimize in the next section by 
introducing join-ordering methods. 

Let us first introduce some notations. We denote by H{r) the set {oi, ..., a„} 
of the head atoms, and by B{r) the set {6i, ..., 6fc, ..., ^6™} of the body 

literals. H+(r) (resp., B~{r)) denotes the set of atoms occurring positively (resp., 
negatively) in B{r). For a literal L, var{L) denotes the set of variables occurring 
in L. For a conjunction (or a set) of literals C, var{C) denotes the set of variables 
occurring in the literals in C, and, for a rule r, var(r) = var{H(r)) [Jvar{B(r)). 

The procedure InstantiateRule, shown in Figure 2, generates the ground in- 
stances of a rule r of a program V. When this procedure is called for the rule r. 
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Forward Procedure FirstMatch{9: Substitution, A: Atom, var MatchFound: 

Boolean, var 6'\ Substitution); 

(* Given a partial substitution 9 for the rule’s variables, and an atom A of the body, 
the procedure computes the first tuple t of the relation corresponding to A which 
matches with 9. It returns in 9' the extension of 9, where the free variables of A 
have been instantiated with the corresponding constants in t. The boolean variable 
MatchFound evaluates True iff such a matching tuple has been found; otherwise it 
evaluates False, and 9' is meaningless. *) 

Forward Procedure NextMatch{9\ Substitution, A: Atom, var MatchFound: 

Boolean, var 9': Substitution); (* Similar to FirstMatch, but finds the next matching 
tuple. *) 

Function InstantiateConjunction(C: Conjunction; 9\ Substitution) : SetOfSubsts; 
var MatchFoundiBoolean; A:Atom; B:Conjunction; ^^Substitution; S:SetOfSubsts; 

begin 

if C is empty (* the end of the body has been reached, 

0 is a legal substitution *) 
then return({0}); 

5 0; A := first_conjunct(C); B := rest_conjunct(C); 

FirstMatch(0,A,MatchFound,^'); 
while MatchFound do 

S ■.= S U InstantiateConjunction(i3,0'); 

NextMatch(0, A, MatchFound, s'); 
end_while; 
return(S); 

end; 

Function InstantiateRule(r: Rule): SetOfGroundRules; 
var 9\ Substitution; 5: SetOfSubstitutions; 

begin 

Let denote the Conjunction of the positive literals in the body of r; 

Order J3ody(B+); 

9 ~ empty -Substitution; 

S := InstantiateConjunction(i3^,S); 
return ({ 7 r I 7 e -S'}); 

end; 



Fig. 2. The process of rule’s instantiation 

for each atom A occurring in the body of r, the set of ground instances I a for A 
previously computed by the instantiator is collected in a relation rel{A) that we 
call the extension of A. Each ground instance for a G Ia corresponds to a tuple 
in rel{A) and vice versa. More precisely, each tuple of constants in the relation 
rel{A) corresponds to a substitution 9 : var {A) — > U-p such that 9 A G la, and 
vice versa. Such a substitution 9 is called a valid substitution for r with respect 
to the given extensions of the atoms occurring in its body. Intuitively, Instan- 
tiateRule performs the natural join of the relations associated with the positive 
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body literals of the rule. Since the rule is safe, each rule’s variable appears also 
in a positive body literal of the rule. Therefore, such a join is in a one-to-one 
correspondence with the set of all ground instances of the rule which are con- 
structable from the set of available instances for the body atoms; each tuple 
of this join corresponds to a valid substitution for r with respect to the given 
extensions of atoms. 

Roughly, InstantiateRule first orders the conjunction R+ of the positive body 
literals of the rule r to be instantiated (by a call to procedure Order JBody, which 
will be described in the next section) , and then calls the function InstantiateCon- 
junction which actually computes the legal instantiations of . This function 
starts from the first atom A in the conjunction R+. It finds, by a call to function 
FirstMatch, the first tuple t matching with A in the relation rel{A) associated 
with A, and binds its variables to the corresponding constants in t. Then, by 
a recursive call to InstantiateConjunction itself, this function takes the second 
atom, say A' in R+, and binds its free variables (note that some variables of A! 
are already bound, if they appear also in A) by finding the first matching tuple 
in rel{A'). The process goes on until either (i) the end of the conjunction has 
been reached (C is empty in function InstantiateConjunction), or (ii) no match- 
ing tuple is found for some body atom (a call to a match function returned 
MatchFound=False) . In the latter case, the (partial) substitution 6 at hand is 
not good, since no instance of the current atom agrees with 6. Therefore, the 
current run of function InstantiateConjunction terminates, the calling function 
changes 6 by finding another matching tuple, and restarts the forward instanti- 
ation phase. In the former case (i.e., in case (i)), the substitution at hand (the 
parameter 0 previously computed by the matching functions) is returned, as it 
instantiates all rule’s variables and hence induces a ground instance of the rule r. 
The calling function adds 0 to the set S of the computed substitutions, and finds 
another match for the atom at its hand to generate further ground instances of r. 
The process terminates when no more match are found (i.e., no more ground 
instances can be generated). 

3 Join-Ordering Methods 

From the previous section, it should be clear enough that computing all the pos- 
sible instantiations of a rule given the relations associated to the atoms occurring 
in its body is equivalent to computing all the answers of the conjunctive query 
joining the relations of the positive literals of the rule’s body. A key issue for the 
efficient instantiation of (the non-trivial rule) r is thus the optimal ordering of 
literals in the body. This problem clearly corresponds to the choice of an optimal 
execution ordering for the join operations in a conjunctive query. 

A good ordering dramatically affects the overall computation time. Many 
relevant real-world examples containing large relations (see next section) cannot 
be solved without a suitable ordering of the body atoms. 

It is worthwhile noting that in ASP programs we have to instantiate many 
rules and, for recursive programs, we have to instantiate the same rule many 
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times, possibly with different relations (until we reach a fixpoint). Therefore, 
the ordering procedure is called very often and should be done efficiently. 

Let r be a rule and the conjunction of the atoms occurring in its body. 
Our procedure Order_Body gets as its input B^ and modifies the ordering of 
atoms in this conjunction in order to minimize the instantiation time of r. 

To choose an optimal ordering, we exploit some information about the rela- 
tions associated to the atoms in also called the extensions of these atoms. 
For each atom A occurring in B+, we know the number T{A) of tuples in its as- 
sociated relation rel{A) and, for each variable X € var{A), the number V{X, A) 
of distinct values for X over rel{A) (i.e., the number of tuples in the projection 
of rel{A) onto X). 

Recall that the relations associated to the atoms of r change, in general, at 
each call to InstantiateRule(r) . Of course, there is more than one call only if r 
belongs to some recursive component of the program. 

In order to meet both the requirements of efficiency of the optimization pro- 
cedure and of efficiency of the instantiation procedure, we employed a greedy 
algorithm, very similar to the one used in traditional database systems for se- 
lecting an optimal left-deep join tree for a given conjunctive query [9]. Roughly, 
at each step * > 1, we have placed the first i — 1 atoms, and we make a greedy 
choice to select the tth atom in the final ordering of B^ . This atom, say A, is 
chosen if A is minimal with respect to some selectivity criterion. 

For the sake of simplicity, we will use the name of an atom A to represent 
both the atom and its extension rel{A), whenever no confusion arises. We will 
denote by the set of the first i — \ atoms and by rel(Bi-i) (or, simply, 

by Bi-i) the relation obtained computing the join of all the extensions of the first 
i — 1 atoms. Hence, the number of tuples T(Bi-i) in this relation is equal to the 
number of consistent substitutions for the atoms in B{i—1). Moreover, we denote 
by var(Bi-i) the set of variables occurring in These variables are called 

the bound variables at step i. Therefore, for any bound variable X € var{Bi_i), 
V (X, Bi-i) is the number of distinct values that X may take over the computed 
relation Ri_i (or, equivalently, over the set of consistent substitutions for the 
atoms in Bi-i). We estimate these numbers during the ordering procedure from 
the statistics we have for the single atoms, rather than explicitly computing them 
at each step. Indeed, in this phase, we do not compute any join (or substitution). 

We next describe three selectivity criteria that we implemented in the DLV 
system. The first is the one used in the current version of DLV [8], the second 
is an adapted version of a criterion used in the context of traditional database 
systems [9], and the third is specifically designed for our purposes. 

3.1 Old-DLV Criterion 

This is the simple method implemented in the current versions of DLV [8] . Let D 
be the set of all atoms in R+ — Ri_i having some bound variable at this step, 
i.e., having a variable in common with some atom in 

We select the atom A to be placed in the fth position of the ordered body as 
follows: 
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— if D 7 ^ 0, then A is the atom belonging to D whose extension has the smallest 
cardinality over the atoms in D; 

— otherwise, i.e., no remaining atom has any bound variable (at step i), A is 
the atom whose extension has the smallest cardinality over all atoms in — 

Therefore, this method gives the maximum priority to the binding of variables, 
and then chooses on the basis of the cardinality of the extensions. 

Example 1. Assume that we are computing the ground instantiation of a rule r 
and that we already placed the first i — 1 atoms. Let X and Y be the bound 
variables at step i, i.e., var{Bi-\) = {X,Y}. Moreover, Let P{X,X'), Q{Y,Y'), 
and R{X,Y, X' ,Y') the remaining atoms in the body of r, i.e., the atoms in 
B{r) — Bi-i, and assume the number of tuples in their extensions are T{P) = 30, 
T{Q) = 6, and T{R) = 300, respectively. 

The Old-DLV criterion first looks for atoms having some bound variable. In 
this example, either A or y occurs in every remaining atom. Thus, the Old- 
DLV criterion chooses Q{Y,Y') as the ith atom because its extension has the 
smallest cardinality. 



3.2 Join Selectivity Criterion 



This method is widely used in relational database systems [9]. We take as the ith 
atom in the ordered body the atom A G B^ — Bi-i that minimizes the following 
selectivity index: selj{A) = [xi A)/T{Bi-i). 

Thus, we take the atom A which leads to the smallest intermediate relation size, 
over all atoms that are still to be ordered. 

The size of a join operation between two relations R and S is T{R) ■ T{S) if 
they do not have any variable in common; otherwise, it is estimated as follows: 



T(i?cx] 5) 



T{R) ■ T{S) 

rixe™r-(ij)nvar(S) niax{I/(A, R), V(X, S')} 



where ]}[ denotes the product operation. 



Example 2. Consider again the rule r in Example 1. We next show how the *th 
atom is chosen according to the join-selectivity criterion. In this case, we need 
some additional statistics. Let V{X, Bi-i) = 30 and V{Y,Bi-i) = 5 be the 
current estimation for the number of different values for the bound variables at 
step i, i.e., for X and Y. Moreover, assume the statistics for the bound variables 
occurring in the remaining atoms are V (A, P) = 30, V{Y,Q) = 5,V (A, i?) = 30, 
and V{Y,R) = 5. Consider the atom P(A, A'). The Join-selectivity criterion 
assigns to this atom the following selectivity index: 



sel{P) 



T(g,_i) ■ T{P) ^ 

max{V(A,B,_i),V(A,P)} ’ 




Similarly, the other atoms get sel{Q) = 1.2 and sel{R) = 2. Thus, according to 
the Join-selectivity criterion, the tth atom is P(A, A'). 
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Note that the above estimation of the size of a join is based on the following 
simplifying assumptions: 

Containment of value sets. If V{X, R) < V{X, S), then every possible value 
for variable X in i? is also a possible value for X in S. 

Preservation of value sets. If X € var{S) is not a join attribute, i.e., Y ^ 
var{R) n var{S), then i? txi S') = S). That is, performing a join 

operation, we do not lose values for non-join variables. 

The interested reader can find a more detailed discussion of these assumptions 
and of this selectivity criterion in [9] . 

Here, we just observe that, because of the above assumptions, after the choice 
of atom A, we update the statistics of the value sets of its variables as follows: 
V{X,Bi) = min{C(X, H), C(X, if X is a bound variable at step i; oth- 

erwise, V{X,Bi) = V{X,A). 



3.3 Combined Criterion 



This selectivity criterion explicitly deals with both the size of the intermediate 
result and the binding of variables, trying to minimize both these factors. 

For this criterion we exploit, as additional statistics, the size of the (ac- 
tive) domains for the variables occurring in r (with respect to the current 
call of InstantiateRule(rJ). We estimate this number, denoted by dom{X), by 
max^gg+ V{X,A). In other words, we assume that there is a relation rel(A), 
associated to some A G H+, which provides the active domain for X, i.e., which 
contains all values for X that also occur in the extensions of other atoms in r. 
In practice, this is the case most of the times and, if not, dom{X) is usually very 
close to the cardinality of the actual domain for X. 

The combined criterion takes as the ith atom in the ordered body the atom 
A G B^ — Bi-i that minimizes the selectivity index selc{A) = sels{A) ■ selb{A), 
where 



sds{A) 



T{Bi-i X A) 

Uxezdom{X)' 



and 



selb{A) 



n 

Y ^var{Bi_i)r\var{A) 



V{Y,A) 
dom(YY ’ 



where Z is the set of variables that A has in common with some other atom 
occurring in B^ , x denotes the semijoin operation, and selb{A) = 1 in the 
trivial case var(Bi-i) D var{A) = 0. 



Example 3. We show how the combined criterion acts on the same rule consid- 
ered in Example 1 and Example 2. We estimate the cardinality of the active 
domains for the variables. From the given statistics, we get dom{X) = 100, 
dom{Y) = 100, dom{X') = 20 and domiY') = 20. For the atom P(X, X'), 
according to the Combined criterion, we compute 



T(H,_i 



X P) = T{P) ■ 



V{X,B,_^) 

dom{X) 




and hence 



sels{P) 



T(P,_i X P) 
dom{X) ■ dom{X') 



9 

100 • 20 



and selb{P) 



V{X,P) 

dom{XY 



30 

1002' 
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Then, the selectivity index for P is 

g 30 

sdc{P) = seh{P) * sdb{P) = ■ ^092 = l-35e"^ 

Similarly, for Q and R, we get sddQ) = 7.5e“® and sdc{R) = 1.69e“^^. It 
follows that, according to the Combined criterion we choose R{X,Y, X\Y') as 
the ith atom in the ordered body of r. Note that, after the choice of this atom, 
all the variables become bound. 

Note that the selectivity index sels{A) is a measure of how much the choice 
of A reduces the search space for possible substitutions. In fact, for the set of 
variables Z that A has in common with some other atom in the full search 
space counts JJxgz dom(X) possible substitutions (or, equivalently, this is the 
size of the full relation over these variables). However, only m = T(Bi-i x A) 
tuples of A are compatible with the previously chosen atoms. Thus, m represents 
the new maximum number of tuples of values for the variables in Z. Note that 
this criterion leans to prefer the atoms with large arities. Assume the extensions 
of two atoms A' and A!' have the same cardinality, that the domains of all 
variables are the same, and that the arity of Al is greater than the arity of A!' . 
Then, sdc{A') and sdc{A") have the same numerators, however sdc{A') has a 
bigger denominator, and in fact, most likely, it provides a better reduction of 
the search space. 

The selectivity index sdb{A) takes into account the bound variables of A. 
Indeed, by preferring atoms with already bound variables, we may detect very 
fast possible inconsistencies. The index sdb{A) is 1, if A has no variable in 
common with the previously chosen atoms; otherwise, it is always < 1. It leans 
to prefer the atoms with the large number of bound variables, and having the 
smaller fraction of values with respect to the full domain cardinalities. Indeed, 
these atoms are the most promising for detecting possible inconsistencies with 
the previously chosen atoms. 

In the implementation of this criterion, we make a further use of variables’ 
domains for removing the assumption about containment of value sets. However, 
we keep the assumption, implicit in the classical join-size estimation, that values 
are distributed uniformly over their domains. It follows that the size of the 
semijoin operation can be estimated as follows: 

T(R . S) = T(S ) . n 

X^var{R)r\var{S) 

Moreover, after we choice an atom A, we update the statistics of the value 
sets of its variables as follows: V{X, Bi) = V(X, Bi-i) ■ {V{X, A)/dom{X)), if X 
is a bound variable at step i; otherwise, V{X^Bi) = V{X,A). 

4 Experimental Results and Conclusion 

4.1 Benchmark Programs 

In order to check the efficiency of the proposed methods, we have implemented 
the methods in the grounding engine of the DLV system, and we have run them 
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on a collection of benchmark programs taken from different domains. We mainly 
selected programs where the instantiation process is hard, and it takes a rele- 
vant part of the entire computation (like, e.g., CRISTAL, HANOI, RAMSEY), 
but we considered also a couple of problems where the instantiation process is 
easy compared to the process of model generation (like BLOCKSWORLD and 
HAMILTONIAN-PATH). For space limitation, we cannot include the code of the 
benchmark programs in the paper. Rather, we provide below a very short descrip- 
tion of the problems which are encoded in the benchmark programs. The pro- 
grams encoding these problems, as well as the binaries used for our experiments, 
can be found at url: www.dbai.tuwien.ac.at/staff/leone/join-ordering/ 

RAMSEY(3,6) / 17 Prove that 17 is not the Ramsey number Ramsey(3, 6) [16]. 
HANOI[6discs,63steps] Hanoi Towers with 6 discs and 63 steps. 

CRISTAL deductive databases application that involves complex knowledge 
manipulations on databases, developed at CERN in Switzerland. 
K-DECOMP Decide whether a conjunctive query has hypertree width at most 
K[ll]. 

TIMETABLING A timetable problem for the first year of the faculty of Science 
of the University of Calabria. 

HAMILTONIAN PATH Hamiltonian Path on a random graph with 700 edges 
and 85 nodes. 

BLOCKSWORLD A typical planning problem where some blocks, placed on a 
table, have to be moved from an initial position to a desidered final position. 
CONSTRAINT-3COL 3col, constraint-satisfaction like encoding, on a graph 
with 30 nodes and 40 edges. 



4.2 Old-DLV Instantiator vs. the New Methods 

We implemented in DLV the three criteria described in Section 3 and we com- 
pared them by using the above benchmark problems. All experiments were 
performed on an Athlon/750 machine with 256MB of main memory running 
FreeBSD 4.2. The binaries were produced with GCC 2.95.2. 



Table 1. A comparison of the join-ordering methods of Section 3 



Program 


Old-mJV 


JoinSel 


Combined 


RAMSEY(3,6)/ 17 


64.10 


8.98 


8.50 


HANOI[6discs,63steps] 


12.20 


71.65 


14.58 


CRISTAL 


19.53 


14.73 


13.37 


K-DECOMP 


30.78 


37.84 


29.82 


TIMETABLING 


283.15 


269.03 


238.35 


HAMILTONIAN-PATH 


2.55 


2.43 


2.41 


BLOCKSWORLD 


3.17 


3.48 


2.99 


CONSTRAINT-3COL 


84.01 


34.98 


31.64 
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The results of our tests are shown in Table 1. There, the first column de- 
scribes the benchmark program; Colums 2-4 report the running times employed 
to generate the instantiation by DLV, when method OZd-DLV, JoinSel and 
Combined is used, respectively. All running times are expressed in seconds. 

Old-DLV, the original technique employed in the DLV system, is the worst 
in most cases and it is outperformed by both JoinSel and Combined. It is worth 
noting, however, that Old-DLVcriterion is not a “naive” method, it takes into 
account both the binding of variables and the size of the extensions of atoms. 
In fact, it performs quite well on a number of problems, e.g., HAMILTONIAN- 
PATH, BLOCKSWORLD, and K-DECOMP. In particular, in the latter case, it 
is better than the Join-selectivity approach. However, it gets worse for problems 
where rules contain many atoms or/and atoms with large extensions, like, e.g., 
RAMSEY and TIMETABLING. 

The Join-selectivity criterion guarantees good performance on a large number 
of programs, because it is based on the minimization of the intermediate partial 
relation computed at each step. In some way, its formulation also takes into 
account the binding of variables. Indeed, a larger number of bound variables in 
an atom leads to more selective joins (i.e., joins with a smaller index). 

The combined criterion yields the best performance for the considered prob- 
lems on average. The main advantage of this criterion comes from the exploita- 
tion that large arity atoms can reduce the number of allowed substitutions for 
many variables at once, provided that their extensions are not too big. For this 
reason, the procedure based on this method outperforms the Join-selectivity 
method on HANOI, K-DECOMP and TIMETABLING. Thus, the combined 
criterion, that we proposed in this paper, seems to be appropriate for the pur- 
pose of body reordering. It is worthwhile noting that we also tried a number 
of variants of this criterion for tuning the contribution of the different factors. 
However, for the considered examples, the formula described in Section 3.3 has 
given the better results, on average. 

4.3 The Enhanced DLV Instantiator vs. Iparse 

Finally, we compared the instantiator of DLV against Iparse, the instantiator of 
Smodels [14] - a promiment ASP system^. The newest version of Iparse (release 
1.0.4, 03-21-2001) accepts logic programs respecting extended- domain restriction. 
This condition enforces each rule’s variable to occur in a positive body literal, 
called domain literal, which (i) is not mutually recursive with the head, and 
(ii) is not unstratified nor (transitively) depends on an unstratified literal (see 
Smodels manual in [18] for details). To instantiate a rule r, Iparse employs a 
nested loop scanning the extensions of the domain predicates occurring in its 
body, and generates the ground instances of r accordingly (i.e., by applying the 
substitutions obtained from the domain atoms and disregarding the substitutions 
violating either some built-in predicate or some variable patterns). Table 2 shows 

^ Since the benchmark programs are head-cycle free we could eliminate disjunction, 
and traslate them in the language accepted by Iparse. 
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a comparison between the instantiator of DLV (with the combined criterion) and 
Iparse (release 1.0.4). For both systems, we report the time (CPU+I/0 time) 
they take to instantiate the program and the size (number of rules) of the output 
instantiation. The symbol ’ means that the instantiator did not terminate 
within 20 minutes. 

Note that, even if both DLV and Iparse compute ground programs that are 
equivalent (with respect to the answer set semantics) , the sizes of the respective 
instantiations may differ significantly. This is due to the different ways they in- 
stantiate a rule r: DLV computes the join of the extensions of the positive literals 
in the body of r; while Iparse enumerates with a nested loop all the extensions of 
the domain predicates in a rule. Thus, the strategy of Iparse is computationally 
less expensive (since no join is computed) if the cartesian product of these exten- 
sions is small, i.e., if there are few domains to scan or they have small extensions. 
However, Iparse may produce an unusefully larger instantiation than DLV, since 
the rules generated by Iparse may contain non-domain body literals which are 
certainly not derivable (i.e., they do not appear in the head of any rule of the in- 
stantiation having an applicable body) . Indeed, the results in Table 2 show that 
Iparse is sometimes very fast and in fact faster than DLV (e.g., for K-DECOMP, 
TIMETABLING and HAMILTONIAN-PATH, where there are a few domain 
predicates in each rule); but the size of the Iparse ’s instantiation is always larger 
than the size of the instantiation computed by DLV. The size difference is rel- 
evant if several atoms (directly or transitively) depend on unstratified literals, 
like, e.g., in TIMETABLING. 



Table 2. The new instantiator of DLV vs the instantiator of Smodels 





DLV 


Iparse 


Program 


time 


size 


time 


size 


RAMSEY(3,6)/ 17 


8.50 


13,344 


- 


- 


HANOI[6discs,63steps] 


14.58 


62,413 


- 


- 


GRISTAL 


13.37 


20,978 


- 


- 


K-DECOMP 


29.82 


121,798 


9.81 


123,165 


TIMETABLING 


238.35 


199,551 


88.60 


3,002,700 


HAMILTONIAN-PATH 


2.41 


49,674 


1.04 


52,511 


BLOCKSWORLD 


2.99 


46,872 


9.73 


459,706 


CONSTRAINT-3COL 


31.64 


7 


805.4 


7 



However, realistic applications often work on large domains or require several 
variables per rule. It follows that, for meaningful problems like, e.g., GRISTAL, 
HANOI and RAMSEY,^ Iparse is not able to compute the instantiation in a 

® HANOI and RAMSEY are the benchmark problems proposed at the AAAI Spring 
Symposium on ASP Programming, March 2001. 
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reasonable time (we stopped the program after 20 minutes). In CONSTRAINT- 
3COL and RAMSEY variables domains are not large; however some rule contains 
a large number of domains, whose cartesian product is big and slows down the 
technique adopted by Iparse.^ 

Moreover, in order to evaluate the quality of the ground program produced by 
the two instantiators, we are making a number of experiments running Smod- 
els on both the ground programs produced by the DLV instantiator and by 
Iparse. Our preliminary results on the benchmark examples are very interest- 
ing. For instance, Iparse is faster than DLV in producing a ground program for 
HAMILTONIAN-PATH. However, Smodels performs very bad with this pro- 
gram as its input, while it is very fast on the ground program produced by the 
DLV instantiator. 

Concluding, the experiments confirm that the database techniques that we 
implemented in DLV are very useful. Even further techniques and results from 
the field of database optimization should be carried out to the area of knowledge 
base systems to improve the efficiency of these systems. 
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Abstract. Most SAT solvers and Answer Set Programming (ASP) sys- 
tems employ a backtracking search by repeatedly assuming the truth of 
literals. The choice of these branching literals is crucial for the perfor- 
mance of these systems. 

Competitive ASP systems employ advanced heuristics to select branching 
literals, which are usually based on “look-ahead” techniques: To evaluate 
the heuristic value of a literal L, truth and falsity of L are assumed in the 
current interpretation, consequences are derived, and the quality of the 
resulting interpretations is evaluated. This process can be very expensive, 
and often consumes most of the time taken by an ASP system. 

In this paper, we present two techniques to optimize the computation 
of the heuristics in the ASP system DLV. The hrst technique singles out 
pairs of literals {A, not B) having precisely the same consequences, which 
allows for making only one look-ahead for each of these pairs. The second 
technique (inspired by SAT solvers) is a 2-layered heuristic, in which a 
simple heuristic criterion reduces the set of literals to be looked-ahead. 
We implement both techniques in the ASP system DLV and evaluate 
their efficiency on a number of benchmark problems taken from various 
domains. The experiments confirm the usefulness of both techniques, 
sensibly improving the performance of DLV. 



1 Introduction 

DLV is a knowledge representation system based on disjunctive logic programming 
(DLP) [Min82,GL91] offering front-ends to several Knowledge Representation 
(KR) formalisms [ELM+98b,ELM+98a,EFLP99]. A strong point of DLV is its 
highly expressive language, which allows elegant and natural representations 
of very hard problems (up to A|^-hard problems). DLV supports a declarative 

* This work was supported by FWF (Austrian Science Funds) under the projects 
Z29-INF and P14781 and MURST under project COFIN-2000 “From Data to Infor- 
mation (D2I)”. 
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programming style which has recently been termed Answer Set Programming 
(ASP), hence it is referred to as an ASP system. The idea of ASP is to represent 
a given computational problem by a logic program whose answer sets correspond 
to solutions, and then use an answer set solver (like DLV) to compute them [Lif99]. 

An efficient support for the highly expressive language of DLV requires the 
use of smart algorithms and data structures as well as sophisticated optimization 
techniques in order to deal with such hard computational tasks. 

DLV employs backtracking search by repeatedly assuming the truth of lit- 
erals [FLP99], and in order to improve the efficiency of the DLV system, in 
a previous paper [FLPOl] we have experimented with a number of heuristics 
for deciding which branching literal to assume. These heuristics are based on 
“look-ahead” techniques: to evaluate the heuristic value of a literal L w.r.t. the 
interpretation / at hand, truth and falsity of L are assumed in the current 
interpretation, and its consequences are derived by computing its determin- 
istic extensions /' = DetCons{I U {A}) (the interpretation /' is guaranteed 
to be contained precisely in the same answer sets containing I U {A}) and 
/" = DetCons{I U {not A}). Note that /' and /" can be inconsistent, in which 
case the search space can be pruned early. 

The heuristic value of A is a measure of the “quality” of the resulting inter- 
pretations I' and I" . Some of these heuristics proved to be very useful, as they 
drastically reduce the number of choice-points arising in an ASP computation. 
However, the computation of these heuristics is very expensive, since the num- 
ber of literals to be “looked-ahead” is very large in some cases, and the cost of 
a look-ahead is linear in the size of the Her brand Base in the worst case. The 
computation of the heuristics thus often consumes most of the total time taken 
by an ASP system, and may slow down the ASP system significantly. 

In this paper, we try to reduce the amount of time needed to evaluate the 
heuristics, by reducing the number of look-aheads that need to be performed. 
The main contributions of the paper are the following: 

A. We define a new condition which is sufficient to guarantee that, at a given 
stage of the computation, two literals {A, not B) have precisely the same 
set of deterministic consequences w.r.t. the interpretation / at hand, that is, 
DetCons{I U {A}) = DetCons{I U {not B}). Consequently, A and not B 
are guaranteed to have precisely the same heuristic values, and we avoid the 
look-ahead for one of them. 

This technique allows us to save 50% of the look-aheads in several cases 
including, e.g., Hamiltonian Path and 3SAT programs. 

B. We design a 2-layered heuristic. A computationally cheap heuristic criterion 
reduces the set of literals to be considered, and the look-ahead to select 
the branching literal is applied only to the literals in this set. This method 
significantly reduces the number of look-aheads, but, unlike the previous 
technique, it is not an “exact” method, that is, it might exclude literals 
which would otherwise have had high heuristic values. Also some literals for 
which the look-ahead detects inconsistency can be missed in this way, so 
there will be less pruning in general. 
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C. We implement the above techniques in the ASP system DLV and evaluate 
their efficiency on a number of benchmark problems taken from various do- 
mains. The results of the experiments are very positive and both techniques 
prove to be useful. Moreover, they are orthogonal and their integration per- 
forms at least as well as the best individual technique, resulting in a relevant 
improvement of the performance of the DLV system. 

In addition to the above contributions, we explain in detail the heuristic 
criterion adopted in DLV for the selection of the branching literal, and the way 
how it is computed. 

It is worthwhile noting that techniques for reducing the number of look- 
aheads have been employed in SAT solvers and in other ASP systems. In par- 
ticular, the ASP system Smodels makes a drastic pruning of the look-aheads 
by eliminating each literal which has been derived during a previous look-ahead 
at the same branch-point: For each literal B G DetCons{I U {A}), the look- 
ahead for B is not performed, because B is guaranteed to be worse than A 
w.r.t. the heuristic function of Smodels. This technique eliminates a higher num- 
ber of look-aheads than our technique described in Item A; but our technique 
is more general and it is applicable to a wider class of heuristic functions. In- 
deed, the technique of Smodels relies on a monotonicity property of the heuristic: 
DetCons{lU{B}) C DetCons{lU{A}) implies that B is worse than A w.r.t. the 
heuristic function of Smodels. Our technique, instead, is applicable to every cri- 
terion determining the heuristic value from the result of the look-ahead (i.e., the 
heuristic value of A depends only on DetCons{I U {A}). In fact, our technique 
can also be applied in Smodels, while the optimization employed by Smodels 
cannot be used in DLV, since the heuristic employed in DLV is not monotonic in 
the sense described above. A 2-layered heuristic similar to the technique of Item 
B above has been successfully employed in the SAT solver SATZ [LA97]. 

2 Answer Set Programming Langnage 

In this section, we provide a formal definition of the syntax and semantics of 
the ASP language supported by DLV: disjunctive datalog extended with strong 
negation. For further background, see [GL91,EFLP00]. 



ASP Programs A (disjunctive) rule r is a formula 

ai V ••• V a„ :- 6i, • • • ,6fc, not bk+i,---, not b^n- 

where oi, • • • , a„, 6i, • • • , 6m are classical literals (atoms possibly preceded by the 
classical negation symbol and n > 0, m > k > 0. The disjunction oi V • • • V a„ 
is the head of r, while the conjunction 6i, • • • , 6fc, not bk+i, • • • , not 6m is the 
body, 6i, • • • , 6fc the positive body, and not bk+i, ■ ■ ■ , not 6m the negative body of 
r. The sets of literals in head, body, positive body and negative body of r, are 
denoted by H{r), B(r), B'^{r), and B~{r), respectively. Comparison operators 
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(like =,<,>,<>) are built-in predicates in ASP systems, and may appear in 
the bodies of rules. A disjunctive datalog program (also called ASP program in 
this paper) 7^ is a finite set of rules. 

As usual, an object (atom, rule, etc.) is called ground or propositional, if it 
contains no variables. 

Answer Sets We describe the semantics of consistent answer sets, which has 
originally been defined in [GL91]. 

Given a program V, let the Herbrand Universe U-p be the set of all constants 
appearing in V and the Herbrand Base Bp be the set of all possible combinations 
of predicate symbols appearing in V with constants of Up, possibly preceded 
by 

Given a rule r, Ground{r) denotes the set of rules obtained by applying 
all possible substitutions a from the variables in r to elements of Up. Simi- 
larly, given a program V, the ground instantiation GroundiV) of V is the set 
Urs-P Ground{r). 

For every program V, we define its answer sets using its ground instantiation 
Ground{V) in two steps, following [Lif96]: 

A set L of literals is said to be consistent if, for every literal i £ L, its 
complementary literal is not contained in L. An interpretation / is a consistent 
set of ground literals. An interpretation / C Bp is closed under V (where 7^ is a 
positive program), if, for every r S Ground{V) , at least one literal in the head 
is true whenever all literals in the body are true. / is an answer set for V if it is 
minimal w.r.t. set inclusion and closed under V . 

The reduct or Gelfond-Lifschitz transform of a general ground program V 
w.r.t. an interpretation / is the positive ground program , obtained from V 
by (i) deleting all rules r £ V whose negative body is false w.r.t. /, (ii) deleting 
the negative body from the remaining rules. An answer set of a general program 
V is an interpretation I such that I is an answer set of Ground{VY . 

3 Answer Set Computation 

In this section, we describe the main steps of the computational process per- 
formed by ASP systems. We will refer particularly to the computational engine 
of the DLV system, but also other ASP systems, like Smodels employ a very 
similar procedure. 

An answer set program V in general contains variables. The first step of a 
computation of an ASP system eliminates these variables, generating a ground 
instantiation of 7^.^ The hard part of the computation is then performed on this 
ground ASP program. 

The heart of the computation is performed by the Model Generator, which is 
sketched in Figure 1. Roughly, the Model Generator produces some “candidate” 

^ This ground instantiation is required to have precisely the same answer sets as P, 
and is usually much smaller than GroundiP) [FLMP99]. 
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Forward Function DetCons(I: Interpretation): Interpretation; 

(* Extends I with the literals that can be deterministically inferred and returns the 
resulting interpretation or the set of all literals C upon inconsistency. [FLP99] *) 
Forward Procedure Select(var I: Interpretation, var L: ClassicalLiteral) ; 

(* Selects the classical literal L having the highest heuristic value (see Section 4 *) 

Function ModelGenerator(var I: Interpretation): Boolean; 

(* The function returns True iff I can be extended to an answer set. *) 
var inconsistency: Boolean; 

begin 

I := DetCons(I); 

if\ = C then retnrn False; (* inconsistency detected *) 
if no literal is undehned in I then retnrn IsAnswerSet(I); 

Select(I,L); 

if ModelGenerator(7 U {L}) then return True; 
else return ModelGenerator(l U {not L}); 

end; 

Fig. 1. Computation of Answer Sets 



answer sets. The stability of each of these is subsequently verified by the function 
IsAnswerSet(I), which checks whether the given “candidate” / is a minimal 
model of the program GroundiVY , the reduct of Ground{V) w.r.t. I. 

The ModelGenerator function is first invoked with parameter / set to the 
empty interpretation.^ If the program V has an answer set, the function returns 
True, setting / to that answer set; otherwise it returns False. The Model Gen- 
erator is similar to the Davis-Putnam procedure employed by SAT solvers. It 
first calls the function DetCons, which extends / with those literals that can 
be deterministically inferred from I . DetCons is similar to a unit propagation 
procedure employed by SAT solvers, but exploits the peculiarities of ASP for 
making further inferences (e.g., it exploits the knowledge that every answer set 
is a minimal model). If DetCons does not detect any inconsistency, a classical 
literal L is selected according to a heuristic criterion by a call to the Select pro- 
cedure. ModelGenerator is then recursively called on / U |L}; if this call does 
not generate an answer set (i.e., I U |L} is not contained in any answer set), it is 
called on /Ujnot L}. The classical literal L plays the role of a branching variable 
of a SAT solver. And indeed the selection of a “good” literal L is crucial for the 
performance of an ASP system. In the next section, we describe the heuristic 
criterion adopted by DLV for the selection of such branching literals, and how 
Select is implemented. 

^ Observe that the interpretations built during the compntation are 3-valued and an 
interpretation 7 is a set of ground literals. A ground classical literal A is True (resp. 
False) w.r.t. to 7 if A G 7 (resp. not A G 7); otherwise A is Undefined w.r.t. 7. 
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4 Evaluation of the Heuristic Function 

In this section, we define the heuristic criterion adopted in the DLV system and 
we describe how it is evaluated. 

The heuristics of DLV is a “dynamic heuristics” (the ASP equivalent of UP 
heuristics for SAT), that is, the heuristic value of a literal Q depends on the result 
of taking Q true as well as false and computing its consequences, respectively. In 
order to reduce the number of look-aheads, the DLV system does not evaluate the 
heuristic value of all undefined classical literals; rather, it considers only a subset 
of the undefined classical literals, called possibly-true literals. The correctness of 
this strategy, adopted since the first release of DLV, is shown in [LRS97]. 

Definition 1. A Possibly-True (PT) literal of V w.r.t. an interpretation I is 
an undefined classical literal p such that there exists a rule r G Ground{V) for 
which all of the following conditions hold: 

1. p is in the head of r: p € H{r); 

2. the head of r is not true w.r.t. I: H{r) D I = 0; 

3. the positive body of r is true w.r.t. I: B^{r) C (/); 

4. the negative body of r is not false w.r.t. /: / n {a : not a G B~{r)} = 0. 

The set of all PT literals of V w.r.t. I is denoted by PT-p{I). □ 

Example 1. Consider the program 7^ = {aV5 :-c. d :-not a, not /. eV/:-fc.} 
and let I = {c} be an interpretation for V, then PTp{I) = {a, b, d}. 

As shown in Figure 2 (initial foreach statement), DLV’s heuristic function is 
evaluated only on the PT literals. It is worthwhile noting, however, that the PT 
literals do not always restrict the set of classical literals to be looked-ahead, since 
all undefined literals are PT in some cases. For instance, in the program encoding 
3SAT (see Section 7.1) every undefined literal is a PT literal, as it occurs in 
the head of a rule having a true (empty) body. In contrast, in the program 
HAMPATH, at a given stage of the computation, the PTs are only those literals 
of the form inPath{a,b) or outPath{a,b), where a is a node already reached 
from the start (reached{a) is True) and (a, b) is an arc of the input graph. 

Let us now turn our attention to the heuristic criterion adopted in DLV to 
choose the “best” among the PT literals. 

A peculiar property of answer sets is supportedness: For each true classical 
literal A in an answer set /, there exists a rule r of the program such that the 
body of r is true w.r.t. / and A is the only true literal in the head of r (r is then 
called a supporting rule for A). Since an ASP system must eventually converge 
to a supported interpretation, ASP systems try to keep the interpretations “as 
much supported as possible” during the intermediate steps of the computation. 
To this end, the DLV system counts the number of UnsupportedTrue (UT) literals, 
i.e., classical literals which are true in the current interpretation but still miss 
a supporting rule (in [FLP99] UTs, called MBTs there, are discussed in detail). 
For instance, the rule : - not x implies that x must be true in each answer set of 
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Procedure Select(var /: Interpretation, var L: ClassicalLiteral) ; 
var ij, I^: Interpretation; 

begin 

L ■- NULL-, 
foreach A G PT-p{L) do 

I^ := DetCons(7 U {A}); (* look-ahead for A *) 

if ij = 71 then / := / U {not A}-, 

else l^ ~ DetCons(7 U {not A}); (* look-ahead for not A *) 
if I^ = 71 then 7 := 7 U {j4}; endif 
endif 

if ij 7 ^ 71 and I^ 7 ^ 71 then (* no inconsistency has arisen *) 

if L = NULL then L := A-, (* hrst literal, no comparison *) 

(* compare A against L w.r.t. the heuristic; *) 
elseif ( UT(l+) + UT(ll) ) < ( UT{1+) + UT{1-) ) then L := A; 
elseif ( UnilX) + UT 2 (ll) ) < ( UT2{1+) + UT2{1^) ) then L := A; 
elseif ( 7/r3(lJ:) + PT3(IX) ) < ( UT 3 (I+) -f UT3(I£) ) then L := A; 
elseif ( US(lX) + US(IX) ) < ( US(1+) + US(1~) ) then L := A; 

endfor 

end; 

Fig. 2. Selection of the Branching Literal by DLV’s Heuristic 



the program, but it does not give a “support” for x. Thus, in the DLV system x 
is assumed true in the current interpretation to satisfy that rule, and it is added 
to the set of UnsupportedTrue literals; it will be removed from this set once a 
supporting rule for x will be found (e.g., x V b:-c is a supporting rule for x in 
the interpretation 7 = {a;,not b,c}). 

Intuitively, since the set of UnsupportedTrue literals must eventually be 
empty when an answer set is reached, the heuristic of DLV tries to minimize 
the number of UT literals, taking particular care of those UT literals which are 
more “in danger” (an UT literal appearing in the head of fewer rules is more in 
danger than a literal appearing in the head of many rules). 

Given an interpretation 7, let UT{I) be the number of UT literals in 7. More- 
over, let UT 2 {I) and UT- 3 {I) be the number of UT literals occurring, respectively, 
in the heads of 2 and 3 rules (which are not already satisfied w.r.t. 7, and can 
therefore be potentially used to support the UT literal).^ The heuristic of DLV 
considers UT{I), UT 2 {I) and UT^^I) in a prioritized way to favor literals yield- 
ing interpretations with fewer UT/UT 2 /UT 3 literals (which should more likely 
lead to a supported model). If all UT counters are equal, then the heuristic 
minimizes US{I) the number of unsatisfied rules w.r.t. 7. 

Since the failure of the computation branch selecting A True starts a new 
branch assuming not A (see last instruction in Figure 1), the heuristic criterion 
considers the effect of choosing a literal A and its complement not A in a bal- 
anced way. To this end, the counters C/r(lJ), UT 2 {lX), and 7/5(lJ), 

® UTi literals do not exist in DLV computations. Whenever a rule r is the last poten- 
tially supporting rule for an UT literal A, then A is inferred via r (see [FLP99]). 
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resulting from the look-ahead on / U {A}, are ordinately added to the coun- 
ters C/r(I^), UT 2 {yx), UT'iiY^), and USiY]^), resulting from the look-ahead on 
/ U {not A}, when evaluating the heuristics. 

5 Look-Ahead Equivalences 

Dynamic heuristics vary only in the interpretation resulting from the function 
DetCons{r^) (resp. DetCons{l^)). It is therefore interesting to identify cases 
where two literals L and L' are look-ahead equivalent, i.e., DetCons{lL) = 
DetCons{lL'), since one of the two look-ahead computations could be saved. 
This notion of equivalence is formalized next. 

Definition 2. Let p and q be two undefined literals w.r.t. an interpretation I. 
p and q are look-ahead equivalent if DetCons{I U |p}) = DetCons{I U {(?}). 

To single out a sufficient and efficiently checkable condition which guarantees 
such an equivalence, we first define the notion of a potentially supporting rule: 

Definition 3. Given a program V, a classical literal a, and a (3-valued) inter- 
pretation I, a rule r G V is a potentially supporting rule for a w.r.t. /, if the 
following conditions are satisfied: (i) a occurs in the head of r, (ii) no literal in 
H{r) — {a} is true w.r.t. I, and (iii) no literal in the body of r is false w.r.t. I. 
Let psupp-p{a, I) denote the number of potentially supporting rules for a. 

We can now formulate the following: 

Proposition 1. If two undefined classical literals a and b occur in the head of 
a rule r in a program V , and a and b are the only undefined literals w.r.t. an 
interpretation I in r (where we assume that there is no multiple occurrence of 
classical literals in rules), then it holds that: 

1. If psupp-pibjl) = 1, then a and not b are look-ahead equivalent. 

2. If psupp-p{a,I) = 1, then not a and b are look-ahead equivalent. 

Proof. (Sketch) Suppose psupp-p{b, I) = 1. Then r is the only rule in V 

which might derive b. Since the body of r is already true in /, such a derivation 

is performed iff a becomes false. Therefore, DetCons(I) either contains both a 
and not b or it contains none of them. A symmetric argument shows item 2. □ 

Example 2. Consider the program {a V 6.} and 7 = 0. Both a and b are PT 
literals, so look-ahead for a, not a, b, and not b is performed, i.e. we compute 
DetCons{{a}) = (a, not b}, D etC on s {{not a}) = (not a,b}, DetCons{{b}) = 
(not a, b}, and DetCons({not 6}) = {a, not b}. In this example we can save the 
look-aheads for not b and b because of proposition 1, and thus save half of the 
look-aheads. 

In DLV computations, we can recognize the applicability of Proposition 1 very 
efficiently and avoid extraneous look-aheads. Experimental results reported in 
Section 8 will show that we avoid up to 50% of look-aheads in some cases (e.g. 
on 3SAT) by exploiting this simple condition. 
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6 2-Layered Heuristic 

In [LA97] a different idea on reducing look-aheads is presented: An easy-to- 
compute heuristics is defined as a first layer, and look-ahead is only computed 
on those possible choices which look promising w.r.t. to this easier heuristics. 
This gives a kind of 2-layered heuristic. 

The simple heuristic criteria defined in [LA97] involve the number of binary 
clauses a classical literal occurs in. The rationale is that this is the number of 
immediate propagations that can be performed during the look-ahead. This idea 
can be directly transferred to our ASP framework: 

Definition 4. A binary clause is a rule which contains exactly two undefined 
classical literals w.r.t. an interpretation I. The number of binary occurrences of 
an undefined literal a is the number of binary clauses a occurs in. 

Note that this notion directly corresponds to the number of immediate prop- 
agations which can be performed by assuming a and not a, so it matches the 
intuition of [LA97]. To reduce the number of literals to be looked-ahead, we 
adopt the following criterion: 

First- Layer Heuristic Sun- Let PT'p(I) be the set of PT literals of a program V 
w.r.t. an interpretation /, and let Shin C PT-p(I) be the set of PT literals having 
more than the average number of binary occurrences w.r.t. all literals in PT'p(I). 
Then, consider only the literals in Shin for the selection of the branching literals 
(i.e., make look-ahead only on these literals). 

Note that our first-layer heuristics is inspired by the same intuition as the 
first-layer heuristics in [LA97], even though it is not precisely the same. 

7 Benchmarks 

7.1 Benchmark Programs 

To evaluate the impact of the two optimization techniques presented in the 
previous sections, we chose a couple of benchmark problems: 3SAT, Blocksworld 
Planning, Hamiltonian Path, and Strategic Companies. 

3SAT is one of the best researched problems in AI and generally used for solving 
many other problems by translating them to 3SAT, solving the 3SAT problem, 
and transforming the solution back to the original domain: 

Let ^ be a propositional formula in conjunctive normal form (CNF) <1> = 
1 V ... V di^f) where the dij are literals over the propositional vari- 
ables xi , . . . , Xm . is satisfiable, iff there exists a consistent conjunction I 
of literals such that I \= <P. 

3SAT is a classical NP-complete problem and can be easily represented in 
our formalism as follows: For each propositional variable Xi (1 < i <m), we add 
the following rule which ensures that we either assume that variable Xi or its 
complement nxi true: Xi V nxi. For each clause c?i V . . . V da in ^ we add 

the constraint not di, . . . ,not da. where d^ (1 < z < 3) is Xj if di is a 
positive literal Xj , and nxj if di is a negative literal —^xj . 
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Hamiltonian Path (HAMPATH) is another classical NP-complete problem from 
the area of graph theory: 

Given an undirected graph G = {V, E), where V is the set of vertices of G 
and E is the set of edges, and a node a € V of this graph, does there exist a 
path of G starting at a and passing through each node in V exactly once? 

Suppose that the graph G is specified by using two predicates node{X) and 
arc(X, and the starting node is specified by the unary predicate start which 
contains only a single tuple. Then, the following program solves the problem 
HAMPATH. 

% Each node has to be reached. 

: -node(A), not reached{X). reached{X):-start{X). reached{X) :-inPath{Y, X). 

% Guess whether to take a path or not. 

inPath{X, Y) V outPath{X, Y ) : -reached{X),arc{X, Y). 

% At most one incoming/out going arc! 

: -inPath{X, Y), inPath{X, Z), Y <>Z. : -inPath{X, Y),inPath{Z, Y),XoZ. 

Blocksworld (BW) is a classic problem from the planning domain, and one of 
the oldest problems in AI: 

Given a table and a number of blocks in a (known) initial state and a desired 
goal state, try to reach that goal state by moving one block at a time, such 
that only unoccupied blocks are moved on top of other unoccupied blocks 
or the table. 

Figure 3 shows a simple example that can be solved in three time steps: First 
we move block c to the table, then block b on top of a, and finally c on top of b. 
Due to space restrictions we refer to [Erd99,FLMP99] for complete encodings. 






Fig. 3. Simple BW Example 



Strategic Companies (STRATCOMP) finally, is a Al^-complete problem, which 
has been first described in [GEG97]: 

A holding owns companies G(l), . . . , G(c), each of which produces some 
goods. Some of these companies may jointly control another one. This is 
modeled by means of predicates prod{P,Cl,C2) — product P is produced 
by companies G1 and G2 — and contr(G, Gl, G2, G3) — company G is 
jointly controlled by Gl, G2 and G3. 

Predicate arc is symmetric, since undirected arcs are bidirectional. 
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Now, some companies should be sold, under the constraint that all goods can 
still be produced, and that no company is sold which would still be controlled 
by the holding afterwards. A company is strategic, if it belongs to a strategic 
set, which is a minimal set of companies satisfying these constraints. 

The answer sets of the following natural program correspond one to one to 
the strategic sets. Checking whether any given company C is strategic is done 
by brave reasoning: “Is there any answer set containing C?” 

strategic{Cl) V strategic(C2) : - prod(P, Cl, C2). 

strategic{C) : - contr{C, Cl, C2, C3), strategic{Cl) , strategic{C2) , strategic{C3) . 

As in [CEG97] we assume that each product is produced by at most two 
companies and each company is jointly controlled by at most three companies. 

7.2 Benchmark Data 

For 3SAT, we have randomly generated 3CNF formulas over n variables (where n 
denotes the size as plotted on the x-axis of the graphs in Section 8) using a tool 
by Selman and Kautz [SK97]. For each size we generated 8 instances, where we 
kept the ratio between the number of clauses and the number of variables near 
the cross-over point of 4.3. 

The instances for HAMPATH were generated by a tool by Patrik Simons 
which has been used to compare Smodels against SAT solvers (cf. [SimOO])®. 
Again, for each problem size n we generated 8 instances, always assuming node 
1 as the starting node. 

The blocksworld problems P3 and P4 have been employed in [Frd99] to 
compare ASP systems, and can be solved in 8 and 9 steps, respectively. We aug- 
mented these by problem P5 which requires 11 steps. For each of these problems, 
we generated 8 random permutations of the input. 

For STRATCOMP, finally, we randomly generated 8 instances for each prob- 
lem size n, with n companies and n products. Fach company O is controlled by 
one to five companies (two groups of companies, where each of these groups con- 
trols the same company O, must have at least one member in common), where 
the actual number of companies is uniform randomly chosen. On average this 
results in 1.5 contr relations per company. 

All experiments were performed on a Pentium III/733 machine with 256MB 
of main memory running GNU/Linux. The binaries were generated with GGG 
2.95.2. The input files used for the benchmarks are available on the web at 
http : //www. dbai . tuwien. ac . at/proj/dlv/lpnmrOl . tar .gz. 

8 Experimental Results and Conclusion 

The results of our experiments are displayed in the graphs of Figures 4-7. For 
each problem domain we report two graphs: In both graphs the horizontal axis 

available at http: //tcs .hut . f i/Sof tware/ smodels/misc/hamilton. tar . gz 
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Fig. 4. 3SAT problems, average running times and look-aheads 



reports a parameter representing the size of the instance, while on the vertical 
axis we report the running time (expressed in seconds) and the number of look- 
aheads, respectively, averaged over the 8 instances of the same size we have 
run (see previous section). The curves labeled by “no opt.”, “opt. 1”, “opt. 2”, 
and “opt. l-|-2”, denote, respectively, the initial (unoptimized) version, the look- 
ahead equivalence optimization, the 2-layered optimization, and the combination 
of both look-ahead equivalence and 2-layered optimization. 




no opt. 



opt. 1 



opt. 2 



opt. 1+2 

|b--d--o| 




instance 



Fig. 5. Blocksworld problems, average running times and look-aheads 



Observe first that both optimizations always bring some gain over the original 
version, as the “no optimization” curve is always on top of the other three curves 
in all graphs. 

The two optimizations have different impact, depending on the problem do- 
main: For Blocksworld, the equivalence optimization performs better than the 
2-layered approach, while for Strategic Companies and 3SAT the opposite holds. 
For Hamiltonian Path both optimizations behave roughly equal. 
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Fig. 6. Strategic Companies, average running times and look-aheads 




no opt. 



opt. 1 



opt. 2 



opt. 1+2 




no opt. 



opt. 1 

b - * .*1 



opt. 2 



opt. 1 +2 

1|q--q--ci| 



Fig. 7. Hamiltonian Path problems, average running times and look-aheads 



The combination of the two optimizations combines the benefits in the sense 
that performance is always as good as for the best of the two strategies. Indeed, 
the curve combining the two strategies (opt. 1+2) often nearly coincides with 
the curve of the best of opt.l and opt. 2, e.g. for Blocksworld opt. 1+2 and opt.l 
are almost equal, while for Strategic Companies opt. 1+2 and opt. 2 coincide. On 
Hamiltonian Path opt.l, opt. 2, and opt. 1+2 all give the same speed-up. Finally, 
in the case of 3 SAT there are even better results for opt. 1+2 than for any of the 
two methods alone. 

Note that for opt. 2 (and opt. 1+2), the runtime and the number of look- 
aheads need not correlate, as fewer look-aheads are performed but the quality 
of the PTs may be worse, which may lead to larger trees. For opt.l the choices 
remain the same, but only the number of look-aheads can be reduced, so avoided 
look-aheads directly reduce the runtime in this case. 

Thus, both optimizations turned out to be useful, and we have incorporated 
their combination in the version of DLV released in June 2001. We believe that 
this is a promising way towards the improvement of ASP systems that should be 
subject of further investigation. Indeed, besides optimizing the implementation 
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of the techniques proposed in this paper, we have already planned future work 
to explore other promising ways to reduce the number of look-aheads. 
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Abstract. Default Logic is recognized as a powerful framework for 
knowledge representation and incomplete information management. Its 
expressive power is suitable for non monotonic reasoning, but the coun- 
terpart is its very high level of computational complexity. The purpose 
of this paper is to show how heuristics such as Genetic Algorithms, Ant 
Colony Optimization and Local Search can be used to elaborate an effi- 
cient non monotonic reasoning system. 



1 Introduction 

People are often used to manage and reason from incomplete information. Every 
day they make decisions without knowing every aspect of their environment. In 
many cases, this type of rough reasoning, based on natural and intuitive knowl- 
edge approximations, appears easier and more efficient than applying formal 
logical or mathematical deduction systems. From these remarks, one could ex- 
pect that an Artificial Intelligence system would be easy to conceive and would 
be very efficient. Unfortunately, this is not the case. Twenty years ago [14] stated 
the foundations of Default Logic which is nowadays recognized as one of the best 
frameworks to capture and formalize common sense reasoning from incomplete 
information. Default Logic provides a representation of non completely specified 
rules by means of rules with exceptions and defines a deduction mechanism to 
get conclusions even if some data are not available. Unfortunately, this approach 
has a very high theoretical level of complexity. As a matter of fact, computing a 
set of plausible conclusions (called an extension) of a finite propositional default 
theory is — complete [5]. The difference in performances between human and 
artificial approaches relies on the fact that human reasoning can avoid many 
verifications while default logic builds a set of coherent conclusions and discards 
some kind of inconsistencies. 

Previous works [11,2] have already investigated this computational aspect of 
default logic and even if some systems have good performances on certain classes 
of default theories, there is no efficient system for general extension calculus. Due 
to this computational complexity, a deterministic method based on the whole 
exploration of the search space would not be efficient for non trivial theories, 
even if it uses some sophisticated pruning methods. 
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We argue that non deterministic approaches [10, Id] can be (on average) more 
efficient, in spite of their incompleteness. 




Fig. 1. GA, AGO + LS systems for Default Logic 



In this paper, we present different approaches, sometimes complementary, 
that we have implemented in operational systems able to perform default rea- 
soning on non trivial knowledge bases. The purpose of our different algorithms 
is to progressively improve a given initial configuration in order to reach a so- 
lution. Three general approaches are considered here. Genetic Algorithms are 
based on the principles of natural selection. Populations of possible solutions 
evolve through a process of mutation and crossover in order to generate better 
and better configurations. Ant Golony Optimization is inspired by the observa- 
tion of the collective behavior of ants when they are seeking food. Its goal is to 
find an optimal path in a graph encoding the problem to solve. At last. Local 
Search relies on an incremental improvement of a potential solution to a given 
problem by local moves from a configuration to its neighbors. Local Search will 
be used here as an additional optimization mechanism and combined with pre- 
vious methods. The general architecture of our system is summarized in figure 1 
and detained in sections 3, 4 and 5. 
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2 Extension Computation in Default Logic 



In Default Logic [14] knowledge is represented by a default theory {W, D) where 
W contains the safe knowledge (in this work it is a finite set of propositional 
formulas) and D is a finite set of default rules (or defaults). A default S = 
a . pi^..,i3n jg inference rule (a, 7 and all fdi are propositional formulas) whose 
meaning is “if the prerequisite a is proved, and if for all i each 

justification Pi is individually consistent (in other words if nothing proves its 
negation) then one concludes the consequent 7 ^” . 

Given a default theory, Reiter has defined a set of its plausible conclusions, 
named an extension^ by means of a fixpoint operator. In addition, he has given 
the following result: 

Theorem 1. [Ij] Let {W,D) he a default theory and E a formula set. We de- 
fine Eq = W and for all k > 0, 



Ek+i = Th{Ek) U < 7 



^D,Ekha, 

and E \f -^Pi,'ii = 1, . . . , n 
Then, E is an extension of {W, D) iff E = (J^q Ek- 



Note that E, the whole extension to build, is used in its own definition. This non 
constructive characterization is also an argument to choose a “guess and check” 
method as we have done in this work. 

It is important to note that a default theory may have one or multiple exten- 
sions and sometimes no extension at all. Now, we give some additional materials 
useful for the understanding of the rest of the paper. 



Definition 1. Let E he an extension of a default theory (W,D), its Generating 
Default Set is 



GD{W, D, E) 



G D\Eha andl 
E \f -^Pifii = 1, . . . ,n j 



Furthermore, given a default theory (W, D), computing its extension E is equiv- 
alent to finding its Generating Default Set A since E = Th{W U cons(A)) [15]. 



Definition 2. Given a default theory (W,D), a set of defaults A C D 
is grounded if A can he ordered as a sequence (5i,...,i5„) satisfying : Vi = 
I, . . . ,n,W \J cons{{5i, . . . , i5i_i}) h pre{6i). 

Lemma 1. [16] Every generating default set is grounded. 

^ If (5 is a default rule, pre{S), jus{S) and cons (5) respectively denotes the prerequi- 
site, the set of justifications and the consequent of S. These definitions will be also 
extended for sets of defaults. 
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The problem we address in this paper consists in an Extension Computation 
Problem (ECP) that can be defined w.r.t. our heuristic approach by the following 
components. 

Definition 3. ECP 

~ A default theory (W, D) 

— The set CQT> = 2^ of possible configurations called candidate generating 
default sets. 

— Given a candidate generating default set C G CQT> , the candidate extension 
associated to C is 



CE{W, D, C) = Th{W U {cons{S) | 5 € C}) ^ 

Given an ECP, a solution is a candidate generating default set C G CQV such 
that CE{W, D,C) is an extension w.r.t. theorem 1. 

The last step of our heuristic approach consists in defining an evaluation 
function in order to compute the fitness of a candidate generating default set C 
w.r.t. the notion of solution. This evaluation relies on the four intermediate 
functions described below. 



/o rates if the candidate extension is consistent or not. 




0 if CE{C) is consistent 

1 otherwise 



fi rates the correctness of the candidate generating default set with respect to 
the definition 1. 



/i(C) = where n = card{D) 



with 7T defined as follows. 



&i&c 


CE{C) h at 


CE(C) h -/3,? 


7T 


true 


true 


true 
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true 


true 


false 
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true 


false 


true 
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false 
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false 


true 


true 


0 


false 


true 


false 
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false 


false 


true 


0 


false 


false 


false 


0 



fc is a positive number that represents a penalty given to each default that 
has been wrongly applied or wrongly not applied. 

/2 rates the level of groundedness of the candidate generating default set. 

/ 2 (C) = card{P) 

where E is the maximal grounded subset of C. 

^ We use CE{C) instead of CE{W,D,C) when it is clear from context. 
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/a definitely checks this property 



h{c) = I 



0 if C is grounded 

1 otherwise 



The complete evaluation function is defined w.r.t. previous components by 
taking into account experimental tuning and theoretical properties. 

Definition 4. Given a Default theory (W, D), a candidate generating default set 
C S CQD, the evaluation of C is defined by evahCQD — > ZU {T,_L} 



Theorem 2. A solution of an ECP is a set C G CQD such that eval{C) = _L. 
We now describe the different methods that we propose to solve an ECP. 

3 Genetic Algorithms 

Genetic Algorithms [8,6] are based on the principle of natural selection. We first 
consider a population of individuals represented by their chromosomes. Each 
chromosome represents a potential solution to an ECP. Applying a genetic algo- 
rithm consists in generating better and better individuals by evaluating, select- 
ing, mating (crossing and mutating) them. 

A representation scheme consists of the two following elements: a chromosome 
language Q defined by a chosen size and an interpretation mapping to translate 
chromosomes in term of generating default sets, which provides the semantics 
of the chromosomes. In our context, for each default encode in 

the chromosome the prerequisite a with one bit, and all justifications /3i, ...,/3n 
conjointly with one other bit. Therefore, given a set of defaults D = {(5i, • • • , Sn} 
the size of the chromosome will be 2n and the chromosome language Q is the 
regular language (0 -I- 1)^" (i.e. strings of 2n bits). Given a chromosome G G Q, 
G\i denotes the value of G at occurrence f, 1 < i < 2n. The interpretation 
mapping, defining the semantics of the chromosomes (also called its phenotype), 
can be formally described as : 

Definition 5. Given a set of default D and chromosome language Q , an inter- 
pretation mapping is defined as 



*//o(C') = 1 

then eval{C) = T 

else if /i(C) = 0 and fs^C) = 0 



then eval{C) = T 

else eval{C) = fi{C) - f 2 {C) 




Therefore, the chromosomes encode the candidate generating default sets as : 
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Definition 6. Given a default set D, a chromosome G G Q , the candidate gen- 
erating default set associated to G is : 

CGD{D, G) = {S^GD \ (f{G, 5,) = true] 

According to the definition 3, every chromosome G induces a candidate exten- 
sion CEfW, D, CGD{D, G)) denoted CE{G) when it is clear from the context. 
Intuitively, for a default Si, if G\ 2 i-i = 1 then its prerequisite is considered to be 
in CE{G) and if G| 2 i = 0 no negation of its justifications is assumed to belong 

to Ce\g). 

Example 1. Let {W,D) = ({a}, ^}) be a default theory. We get : 

CGD(lOOOll) = and GA(lOOOll) = Th{{a,c}) which is really an exten- 
sion but also GGD(lOlOll) = and Gif(lOlOll) = Th{{a,c,^b}) 

which is not an extension. 

The GA we use deals with some intermediate populations as it can be sum- 
marized by figure 2. 






parents 



p, 



children 



Fig. 2. Main steps of the GA 



Generation of the initial population P is crucial to the efficiency of GA. The 
most simple way is a random generation but this does not take into account 
the default theory of interest. A more efficient way consists in building chromo- 
somes whose phenotypes are grounded (consistent) subsets of D. We introduce 
a probability of insertion of a default in the candidate generating default set pi 
to randomly create a candidate and we randomly associate to each default 5 
oi D & number ps G [0, 1]. The induction definition below gives by fixpoint the 
candidate generating default set Aoo . 

— Z\q = 0, Dq = D, 

— Vj > 0,V(5 G Dj_i,W U cons{Aj-i) \- pre{5), 

Ai = A,_i U |(5| if PS < Pi and 

WU cons{Aj_iU{S}))\/ ± 

= Aj-i otherwise 
D, = \ { 5 } 

Then a chromosome Goo can be chosen randomly from {G\CGD{D, G) = Aoo}- 
We also guarantee that all the chromosomes of the initial population are different. 
If we add the condition 



V/3 G jus{S),W U cons{Aj-i U {5}) 1/ -1/3 
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to the inductive part of the construction we are able to generate an initial pop- 
ulation of incrementally non-conflicting grounded phenotypes [9]. However, we 
never completely check that all defaults are not conflicting because our goal is 
not to directly build a generating default set. As mentioned in the introduction, 
we think that this task is too difficult for a classical algorithm and we just search 
good starting points for our algorithm. Note, that for a technical reason explained 
below Sp, the size of the population, is such that 3Np, Sp = ^p(jVp-H) ^ 

Then, we build P-y where chromosomes of P are ordered w.r.t. their eval- 
uation and where two identical chromosomes are represented only once. The 
ordering ^ of the individuals is the natural extension of the usual ordering of Z 
extended with: Va: € Z, a; ^ T and Va; € Z, T ^ a:. 

After that, the purpose of the selection stage is to generate a selected popula- 
tion Psei containing chromosomes with the best rates according to the evaluation 
function. Furthermore, we try to keep a large diversity of selected chromosomes 
by introducing a Hamming distance PId (Hamming distance is the number of 
differing bits between two chromosomes) . Pgei is defined as the Np-Hist chromo- 
somes of P^ respecting the Hamming distance. 

We choose the ranking selection to generate the parent population Pparents- 
The best chromosome in Pgei is duplicated k times, the second one k — I times, 
..., and the one 1 time in Pparents- 

Genetic operators are now applied on Pparents hr order to generate the off- 
spring Pchiidren- They are controlled by a crossover probability Pc and a mutation 
probability Pm- The crossover is performed as : 

~ select randomly two chromosomes A = (ai,...,a2„) and B = (6i, ..., 62™) 

in Pparents 

— generate randomly a number r S [0, 1] 

— if r < Pc then the crossover is possible; 

• select a random position pS {l,...,2 n — 1} 

• the chromosomes (oi, ..., Op, a^+i, ..., a2n) and (61, ..., 6p, 6p+i, ..., 6271) 
generate the two new chromosomes (oi, ...,Op,6p+i, ...,b2n) and 

(61, ...,6p, ttp+i, ..., 02n) that are put in Pchiidren- 

— else A and B are put in Pchiidren without crossover 

Mutation is defined as : 

— for each chromosome G € Pchiidren and for each bit bj in G, generate a 

random number r S [0, 1], 

~ if r < Pm then mutate the bit bj (i.e. flip the bit). 

The population obtained after these operations becomes the current popula- 
tion and will be the new input of the whole process described previously. This 
full process is repeated to generate successive populations in which the best 
chromosome w.r.t. the evaluation function represents the current best solution 
to the ECP. If a chromosome G such that eval{GGD{D,G)) = T appears in 
a population then the method stops and GE{G) is an extension for the given 
default theory. Otherwise, it stops when a maximal number of populations to be 
explored is reached. 
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4 Ant Colony Optimization 

Ant Colony Optimization (AGO) metaheuristics [4,3] have been inspired by the 
observation of the collective behaviour of ants when they are seeking food. Let 
us suppose that there are many ants in a nest and that we deposit food in a 
place linked to the nest by two different paths P\ and P 2 , such that P\ is shorter 
than P 2 . At the beginning of their exploration approximatively the same number 
of ants will choose one path or the other. But, after few minutes, most of the 
ants will use the shortest path Pi . The emergence of this shortest preferred path 
is explained by the following points : 

~ every ant puts a little bit of pheromone all along its walk 
~ every ant directs itself by doing a probabilistic choice biased by the amount 
of pheromone that it finds on each possible path 
~ the pheromone evaporates 

Thus, the amount of pheromone on PI increases faster than on P2 since in a 
same duration a greater number of ants choose this path. As a consequence, a 
greater number of ants choose PI since its attractivity becomes higher. By rein- 
forcement, the amount of pheromone on P2 decreases and this on PI increases 
directing almost all ants on the shortest path. 

This collective behavior based on a kind of shared memory (the pheromone) 
can be used for the resolution of combinatorial problems that can be encoded 
as the search of an optimal path in a graph. For the ECP in Default Logic we 
propose the following encoding. 

Definition 7. The default graph of a default theory {W, D) is 

G(W, D) = {D U {in, out}, A) 

where each default becomes a vertex and in and out are two particular vertices 
added to the default set. A is the arc set defined by 

A = {{in, 6),yS G D\W h pre{5) and V/3 G jus{5)W U cons(S) \f -■/3} 

U {(5, (5') G , (5 yf i5'} U {(5, out),\/5 G D} U {{in, out)} 

In addition, each arc {i,j) € A is weighted by an artificial pheromone that 
is a positive real number. 

Definition 8. Given a default theory {W,D), a path P from in to out in 
G{W,D), the candidate generating default set associated to P is: CGD{D, P) = 
PHD. 

In the sequel, we identify vertices and defaults and we indifferently use P as a 
path in the graph or as a candidate generating default set by discarding in and 
out if necessary. Thus, the goal of AGO is to find a path from in to out that 
corresponds to a true generating default set. 
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We do not systematically put an arc from in to every default in D, since we 
want to start the search by defaults that can be applied in W. In addition, after 
the building phase, we remove from A the arcs 

{S, _) and (_, i5) if 3(3 S jus{S), W U cons (6) I — '/3 

because such a default 6 (like can never be applied so it is useless to build 
path including S. 

(^, S') if W U cons(S) U cons(S') h _L 

since S and S' are incompatible together. It does not forbid these two defaults to 
appear in the same path but it reduces the search space. Obviously, many other 
efforts could be done to prune the graph by a deep analysis of the default theory 
but this could become very expensive. 

At the beginning, the pheromone on every arc of the graph is initialized to 1 
in order to give equal chance to all paths. During the process this pheromone 
globally evaporates and increases on arcs that are on good paths in order to 
concentrate a great number of ants on the better parts of the graph. 

In order to guide each ant during its journey from in to out we also use a 
local evaluation based on the function loc. 

Definition 9. Let P a path in the graph and S a default. We say that: 

— S is grounded in P, ifWU cons{P) h pre{S) 

— S is compatible with P, if ^(3 G jus{S), W U cons{P) \J (3 is consistent 
and we define 

{ 1 if S is grounded in P 
and compatible with P 
0 otherwise 

This local function combined with the recorded pheromone leads to the definition 
of the attractivity of a vertex <5 for an ant staying on the last vertex of a partial 
path P between in and out. 

Definition 10. Let G{W.D) = {V,A) a default graph, P a path from vertex 
in to vertex Vi. We define TZ(vi,P) = {vj G V\P s.t. (vi,Vj) G A} the set of 
vertices reachable from Vi and the attractivity of each vertex vj G TZ{vi,P) 

^Vk&'R(vi,P) * l0C[P,Vk) 

On each vertex Vi during its travel from in to out, an ant chooses the next 
vertex by a random choice between all reachable vertices vj and this choice is 
biased by the values A(vi, Vj, P). By definition of the function loc the only paths 
that can be explored correspond to sets of defaults that are grounded (in sense 
of def. 2) and maximal. By this way we discard candidate generating default 
sets that have obviously no chance to be a solution of the ECP and the search 
process focused on “better” candidates. 

So, the main iteration of the whole algorithm is : 
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— release N ants at vertex in 

— evaluate their paths Pi, i = 1 . . .n, from in to out by eval{CGD{D, Pi)) 

— increase the pheromone on better paths : <f(i,j) <— <f(i,j) + Q.%^~^,\larc{i,j) 
in the best k paths 

~ decrease (1%) the pheromone on every arc (evaporation) 

~ if the evaluation of the best path is _L then the algorithm stops and the 
best path P gives an extension CE{CGD{D, P)), otherwise a next iteration 
starts until the maximum number of iterations is reached 



5 Local Search 

Local Search (LS) is a class of powerful methods to tackle difficult optimization 
problems. The development of modern metaheuristics such as Tabu Search or 
Simulated Annealing [1] has greatly increased their use and their efficiency. 

In this work, LS will not be used as search heuristic alone but combined with 
GA and AGO to get better results. For an EGP, a chosen number of the best 
individuals (in the population or in the set of paths) are improved by a number of 
LS iterations. After this improvements, they are put back in the population (or 
set of paths) for the next GA or AGO iteration. This acts as an improve/repair 
stage. Therefore results depend on the number of individuals to submit to LS 
and on the number of iterations to perform. 

The LS framework can be described as follows : given a finite set of config- 
urations S and a cost function /, the purpose of the method is to determine an 
optimal s* such that Vs S S, f{s*) < /(s). 

Local search mainly relies on the notion of neighborhood, which allows the 
search algorithm to move from a configuration to another one, in order to reach 
an optimum. Therefore, given a neighborhood function J\f: S ^ 2'^ and an initial 
configuration sq, a LS algorithm produces a series of configurations (si)ig[o..„] 
such that Vf, s^+i S Af{si). 

Here, the search space is the previously defined candidate generating defaults 
set CQT). We keep the previous evaluation function eval (def. 4) as cost function. 
We just focus here on the definition of the neighborhood. 

Goncerning the moves in this search space, according to the definition of 
candidate extensions associated to individuals, they will be defined w.r.t. the 
notion of applied default. We impose that two neighbor candidate generating 
default sets differ only by one of their defaults. The neighborhood can be defined 
as a function : Af: CQV ^ 2^^^® such that Af{G) = {G' G CQV \ G' = CU{(5}, S ^ 
GWG' = G-{S},6gG}. 

In order to experiment local search techniques and their combination with 
the previously described methods, two methods are used to explore the benefits 
of two different and representative managements of the moves: Descent with 
Random Walk (DRW) and Tabu Search (TS). 

The DRW consists in choosing at each iteration the best neighbor which 
replaces the current configuration only if it is better. Using this approach, a 
local optimum is always reached. A random walk principle is added to escape 
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from local optimum by moving from a current configuration to another having 
a worst evaluation with a certain probability. 

TS consists in moving from a configuration to its best allowed neighbor which 
is not necessarily better than this current configuration. The allowed neighbors 
are configurations that are not forbidden by the tabu list and each current con- 
figuration is recorded in this list. Therefore, possible cycles are avoided thanks to 
this tabu list. A so-called aspiration condition insures that, if the neighborhood 
contains a better configuration than the current one, then it will be accepted as 
the new current configuration even if it is in the tabu list. 

Of course, there exists many variants and extensions of these basic principles. 

6 Experimental Results and Conclusion 

We report here some experimental results on GA, AGO and GA-I-LS systems 
that we have implemented in Sicstus Prolog 3.8.3 (we have also implemented 
AGO-I-LS but due to a lack of space we only point out here some of our results) . 

Diversification : Table 1 refers to the influence of the Hamming distance 
for the problem hamJoJZ that encodes, with 45 defaults, a Hamiltonian cycle 
problem as in [2]. 

Tests have been done by 30 runs per Hamming distance hd with parame- 
ters Sp = 465, Pc = 0.8, Pm = 0.1, Pi = 0.9, an initial incrementally non- 
conflicting grounded population and a maximum number of iterations equals to 
200. %s is the number of successful runs, ani the average number of iterations, 
at the average time in seconds for a run, ati the average time in seconds for 
one iteration, anis the average number of iterations for the successful runs and 
ats the average time in seconds for one successful run. It shows the importance 
of population diversity to increase the stability of the method (in number of 
iterations) and to speed up each iteration by decreasing the size of the popula- 
tion. It demonstrates also that a too high selective pressure {Hd > 13) strongly 
reduces the chances to have a successful run by decreasing too much the size of 
the selected population (and then the offspring). 

LS tuning : In order to get good performance improvements from the combi- 
nation of LS and GA, we have to adjust the parameters of the two LS algorithms. 
Goncerning DRW the tuning consists in determining the best value for the ran- 
dom walk parameter. Experiments provide us a value around 0.05 (the aim is to 
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avoid too stochastic moves). Concerning TS, we have to adjust the length of the 
tabu list (the tabu tenure). In fact the more important parameters are the depth 
of the LS used and the number of candidates given to LS after each GA (or AGO) 
iteration. For both LS algorithms, it appears that 5 LS iterations on the 5 best 
candidates are a good compromise to get interesting results. Results obtained 
with GA+DRW and GA+TS are given in table 2. Due to the small number of 
iterations, it appears that the tabu tenure does not really affect the results. Due 
to this specific use, DRW and TS can be considered as a way to reach quickly a 
local optimum from a good configuration. Their respective performances depend 
on the two different heuristics they used to explore neighborhood. Moreover, pa- 
rameters can be finely tuned according to each problem. 

Results : Table 2 provides us information on the performances of our meth- 
ods (with the notations of table 1). We report our best AGO experiments in 
which we use = 100 ants and the k = 7 best paths for reinforcement. We can 
remark the great impact of LS on the number of iterations of GA while only 
5 individuals of the whole population are improved at each GA iteration. This 
also allows us to compare the performance in time of GA w.r.t. AGO and to 
compare our systems with DeRes [2]. 

Forthcoming works : Our methodology can be easily adapted to other 
variants of default logic provided that we adapt the function eval to the definition 
of extension in the targeted default logic. Moreover, our systems are able to do 
query answering in full Reiter’s Default logic and this will be described in a next 
paper. 

We have to mention that, on logic programs with stable model semantics (a 
subcase of default logic), the system Smodels [12] has best performances. We 
think that it is because the benefit of our approaches has no effect on this kind 
of problem whose complexity (A^P-complete) is less than — complete. But, 
previous people example can only be encoded in full Reiter’s default logic that is 
beyond the scope of Smodels. Another interesting feature of our approach is its 
ability to do a kind of anytime reasoning since when the method stops without 
giving an extension, we get some approximate solution that can be useful. 

An interesting way to explore is to investigate how we could derive benefits 
from the blocking set and supporting set structures introduced in [7]. It can 
be useful to define a more suitable neighborhood in the LS or to introduce a 
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reparation mechanism in GA or to forbid some partial paths in AGO. Another 
question to deal with is the non existence of extension problem. Actually, if a 
default theory has no extension our systems stop after having done their maximal 
number of iterations and we can not attest that there is an extension or not. But, 
the only way to assert that a general default theory (W, D) has no extension is 
to explore the whole set CQT> = 2^ and this is not practicable for non trivial 
cases. Nevertheless, [7] gives some sufficient conditions of non existence that can 
be helpful in our work. 
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Abstract The problem of computing A-minimal models, that is, models 
minimal with respect to a subset X of all the atoms in a theory, is very 
relevant for computing circumscriptions and diagnosis. Unfortunately, 
the problem is NP-hard. In this paper we present two novel algorithms 
for computing Z-minimal models. The advantage of these new 
algorithms is that, unlike existing ones, they are capable of generating 
the models one by one. There is no need to compute a superset of all 
minimal models before finding the first A-minimal one. Our procedures 
may use local serach techniques, or, alternatively, complete methods. 
We have Implemented and tested the algorithms and the preliminary 
experimental results are encouraging. 



1 Introduction 

Minimal model computation is a crucial task in many reasoning systems in Artificial 
Intelligence, including Logic Programming, Nonmonotonic Reasoning, and Diagnosis 
[Re87,Mc80,dKW87]. Indeed, a considerable effort has been made to analyze the 
complexity of this task and to build efficient algorithms and systems that solve it [e.g. 
BD96,KL99,.JNS00]. 

In this paper, we consider a more general computational problem- the problem of 
computing A-minimal models. When we look for A-minimal models, we search for 
models that are minimal with respect to a subset X of all the atoms in the theory. The 
task of computing minimal models is a special case of generating Z-minimal models, 
taking X to be all the atoms in the theory. A-minimal models are particularly relevant 
In Diagnosis and Circumscription [Re87,Mc80,Li85]. In the logical approach to 
Diagnosis, the artifact to be diagnosed and the behavior of the system are encoded as 
a set of logical sentences called the system description and the observations, 
respectively. The components of the system are represented by constants, and their 
status - whether or not they are functioning well - is indicated by a special predicate 
called an abnormality predicate and denoted ab(.). Normally, we assume that all the 
abnormality predicates are false, that is, that components in the system well behave. 
Once there is a fault, the theory composed of the system description and the 
observations becomes logically Inconsistent if we assume that none of the 
components is abnormal. To resume consistency, we must assume that some 
components are malfunctioning. We prefer to explain the inconsistency with a 

T. Eiter, W. Faber, and M. Truszczynski (Eds.): LPNMR 2001, LNAI 2173, pp. 322-335, 2001. 
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minimal subset of abnormalities. It does not make sense to assume that a set of 
components are broken when the assumption that only a subset of this set is 
malfunctioning can explain the faulty behavior of the system. This is called “The 
Principle of Parsimony”. By this principle, we assume that only minimal subsets of 
components are faulty, or in other words, we look for models that are minimal with 
respect to the abnormality predicates. 




Fig. 1. An example circuit 



The systems descriptions and observations are usually represented in first order 
logic. For the sake of simplicity, we will use propositional logic in this paper. The 
algorithms presented here can be used for function-free first-order minimal model 
computation by first grounding the theories involved. Alternatively, the algorithms 
shown here can serve as a basis for developing algorithms tailored for first-order 
logic. 

As an example for model-based diagnosis using minimal models, consider the 
simple circuit shown in Figure 1. Assuming ABl and AB2 mean that gate “not-1” and 
“not-2”, respectively, are malfunctioning, the system description {SD) for this gate is: 

~^AB\ {In\ -nln2] 

— \AB 2 — ^ \_Iti 2 4—^ — I Out ] 

Now, assume that Ini is 0 and Out is 1, indicating that the circuit is faulty. The 
observations (OBS) in this case are {-<Inl,Out}. If we assume that both gates are 
normal and take the theory that is the union of SD, OBS, and the literals {-<AB1, 
-<AB2}, - we get an inconsistent theory. However, if we consider the theory 
consisting of the union of SD and OBS alone, and we look at the A-minimal models of 
this theory taking X to be {ABl, AB2}, we obtain two such models, in each of which 
only one gate is abnormal. Hence the diagnosis for the above system and observations 
is that either the first or the second (but not both) circuit is faulty. 

The circuit example also illustrates why a demand-driven computation of the X- 
minimal models is advantageous. Each A-minimal model explains the faulty behavior 
of the system by suggesting a minimal set of components that may be abnormal. If we 
are given the models one by one, we can test the suspect components while the next 
model is being computed. 

The paper is organized as follows. After presenting some basic definitions and known 
results in Section 2, we present two demand-driven algorithms for A-minimal model 
computation in Section 3. Both algorithms may be implemented either using local 
search methods or complete methods. Z-minimal models are also very relevant in 
computing Circumscription. We elaborate on that in Section 4. In Section 5 we report 
on experiments done on the algorithms developed and in Sections 6 and 7 we present 
related work and concluding remarks. 
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2 Preliminary Definitions 

In this section we provide some basic terminology used throughout the paper. 

• Literal - propositional symbol (atom) (positive literal) or its negation, (negative 
literal). 

• Clause - disjunction of literals. 

• CNF theory - conjunctive normal form, a conjunction of clauses. All the theories 
in this paper are assumed to be in CNF. Hence by theory we mean a set of 
clauses. A theory is Horn if and only if each clause in the theory contains at most 
one positive literal. 

• Positive graph of a theory - Let L be a theory. The positive graph of T is an 
undirected graph (V, E) defined as follows: V = / P| P is a positive literal in 
some clause in T], E= {(P, Q)\ P and Q appear positive in the same clause}. 

• Vertex cover - Let G = (V, £T) be a graph. A vertex cover of G is a set V’ c V such 
that for each e e E there is some v e V’ such that v e e. 

• Vertex cover of a theory - Vertex cover of the poistive graph of the theory. Note 
that if all the atoms of a vertex cover of a theory are instantiated, the theory 
becomes Horn. 

• Model - a truth assignment to all the atoms in the theory that makes the theory 
true. 

• Pos (M) - the set of the atoms assigned true in a model M. 

• Lit(v) - A representation of a truth assignment v as a set of literals. Eor example, 
if v=(P=true, 2=false, P=false/, then Lit(v)=(P,-'Q„-tR}. 

• Unit clause - clause that contains only one literal. 

• Unit propagation - the process where given a theory T, you do the following 
until there is no change in the theory (no new clauses are generated and no 
clause is deleted): you pick a unit clause C from T, delete the negation of C from 
each clause and delete each clause that contains C. 

• T0S -is the result of unit propagation on PlJ S. 

• {true (false)}- set of atoms that are assigned with true (false). 

• Int^ (M) - the value (integer) of a model M over a given ordered set of variables 

A=|P^,...,P^}, seen as a binary number where P^ is the most significant bit (MSB 
or MSV - most significant variable) and P„ is the least significant bit (LSB or 
LSV - least significant variable). So, for example, if M=(P=true, Q=false, 
P=false/ then (M)=(10 in binary code) = 2; (M)=(100 in binary 

code) = 4. 

• X-Largest (Smallest) model - the model with the largest (smallest) value (IntfM)) 
with respect to a given ordered set of variables X={P^,...,PJ. 

Z-minimal model 

Let P be a theory over a set of atoms L, X^ L, and M a model for T. M is an X- 

Super of another model M’ if and only if pos(M ‘)nZ is a proper subset of 

posfM) n X. 
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M is an X-minimal model for T if and only if there is no other model M’ for T such 
that M is an X-Super of M\ If M is an Z-minimal model for X = L/\i will be called 
simply a minimal model. It has been shown that a Horn theory has a unique minimal 
model that can be found in linear time [DG84]. 



Find-A'-minimal (T^,M) 

Input: A theory T, an ordered set X^{Pr,...P]} whieh is a subset of the 
atoms in T. 

Output: true if T is satisfiable, false otherwise. In ease T is satisfiable, the 
output variable Mis a smallest X-minimal model of T w.r.t. the ordering {P^ 
,...Pi}. 

1. If— isat(r,M) return false; 

2. For i : = r downto 1 do 

a. If isHom(r) then M=HomMinimalModel(r), Goto 4. 

b. If sat {T U {-^Pi } , M) then T\ ® } 

else T\^T® {P.} 

3. sat{T,M) 

4. Return true; 

Fig. 2. Algorithm Find-X-minimal 



Example 2.1 Suppose a theory I), has variables P^...P„, and X={P^,...PJ. Assume 
further that T), has exactly four models (ordered from the X- smallest to the 
X-largest): M =0010110, M =0011000, M,= 0100111, and M,=1100000. M, and M, 
are the only X-minimal models of T^y. 

Throughout this paper, unless stated otherwise, models that agree on the truth 
assignments given to all the variables in X are considered identical. 

The following theorem is quite interesting. Its proof is based on results from 
Combinatorics [Bo86]. We provide in the appendix a proof suggested by Lomonosov 
[LoOO]. 



Theorem 2.2: Let T be a theory and let X be a subset of the atoms that are used in T. 



The number of X-minimal models of T is at most 



^ n ^ 
n 

2 



, where |X|=«. 
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S-A'-Min {T,X) 

Input: A theory T. An ordered seiX={Pr,...Pi} which is subset of the 
variables in T. 

Output: true if T is satisfiable, false otherwise. If T is satisfiable, output one 
by one all X-minimal models of T from the smallest to the largest w.r.t 
the order {Pr,...Pi}. 

1. If —&2A(T,M) return false. 

2. ModelsTable = 0. 

3. For ;■ : = 0 to 2'^-! do: 

a. v: = instantiation ofXthat equal i. (Pj least significant, P^ 
most significant). 

b. If V is not an X-super of a model in ModelsTable 
then 

if Satf^rU Lit(v), M) 

Output (M); 

Add Mto ModelsTable; 



Fig. 3. Algorithm S-A-Min 



Find Z- Minimal 

In Figure 2 we show an algorithm for computing one Z-minimal model for a 
theory. The algorithm takes OflAj) steps and uses Of |X|) calls to a satisfiability testing 
procedure. A similar algorithm was shown in [BD96]. Find-Z-minimal tries to assign 
as many false as possible and calls a Horn satisfiability checker once there are enough 
instantiations so that the theory becomes Horn. 

Notes on Find-X-Minimal (for future use): 

Note 1: if the theory T is not satisfiable then M is returned with the value it was 
initialized with. 

Note 2: The algorithm uses a procedure sat(T,M) that returns true iff T is satisfiable. 
In case T is satisfiable, M is a model of T. Each model M is an array of booleans, M[i] 
being the truth value assigned to P.. It might be the case that M has entries for 
variables not appearing in T. These variables will be assigned false by sat(T,M). We 
do not always use the model that sat returns. In implementations, we can use a 
version of sat that does not return a model when we do not need it. 

Note 3: If sat(T,M) is complete then Find-Z-Minimal is also complete, otherwise 
Find-Z-Minimal is not complete. 
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3 New Algorithms 

In this section we will present two algorithms for computing Z-minimal models. 

The correctness of these algorithms can be proved only if a complete SAT procedure 
is assumed. Otherwise the algorithm is not complete and may generate a model which 
is not X-minimal. 

The first algorithm, called S-X-min, goes over all possible instantiations to X, using 
an ordering having the property that whenever a model is not A-minimal, then it must 
be an A-super of an Z-minimal model already generated. S-A-min is shown in 
Figure 3. 

J-X-min (T,X) 

Input: A theory T, a subset X of all the variables in T. We assume 
that I =r. 

Output: true if T is satisfiable, false otherwise. In case T is 
satisfiable, output one by one all A-minimal models of T from 
the smallest to the largest. Each model is an array of booleans 
M, M[i] is the truth value assigned to P, 

1. Let P„.], ...Po be an ordering on the variables in T such that 

the first r variables are all the variables from X. Variable Pg will be 
considered the least significant and the variable P„.y will be 
considered the most significant. 

2. If Find-A-minimal(Z’, {Pn-i, ■■.,Pn-r}, M) = false return false; 

3. ModelsTable = 0. 

4. Output (Af); Add M to ModelsTable. 

5. Let i be the index of the least significant variable that satisfies: 

1. P,eX 

2. false 

3. Pi is more significant than another variable Pj such that Pj£ 

X and M//7=true;. 

if there is no such variable return true. 

6. M[iJ^ true; 

7. If Find-A-minimal (T ®Lit(M[n-l,...i]), {Pn.i,---Pn-r\, A/)=false then 
goto 5. 

8. If M is not an A-super of a model in ModelsTable then goto 4. Else 
goto 5. 



Fig. 4. Algorithm J-A-Min 
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Lemma 3.1: Algorithm S-Z-Min is correct. 

The proof is omitted due to space constraints. The basic argument is that a model 
which is not Z-minimal must be an Z-super of a model that is Z-smaller. Since the 
models are generated in an increasing integer {IntJ order, all and only the Z-minimal 
models will be generated. 

The second algorithm that we present is algorithm J-X-min shown in Figure 4. For 
each theory T there is a (possibly empty) set • of all the Z-minimal models of T. You 
can order the n Z-minimal models in • in order MJ where Mj is the smallest 

Z-minimal model and is the largest. The algorithm is based on the following 
observation: 

Lemma 3.2: Algorithm Find-Z-minimal (Figure 2) will always return the smallest Z- 
minimal model for some variable ordering. (If exists). 

The proof is omitted due to space constraints. Intuitively, the Lemma is true 
because Find-Z-minimal tries to assign as many false as possible and backtracks on 
this choice at as less significant bit as possible. 

Once we find the first, smallest, Z-minimal model, we serach for the next one. 
Suppose that |Z|=4 and the smallest model is 0100.... There is no point in checking if 
truth assignmnets starting with 0101 or 0110 are models because it is obvious that 
they are Z-super of the first model. Hence the algorithm will "jump" to check whether 
truth assignmnets starting with 1000 may be models. Hence the "J" in the algorithm 
name. 

Theorem 3.3: Algorithm ]-X-Min is correct. 

Proof (sketch): Let T be a theory and let Z be a subset of its variables. If T is 
inconsistent, the algorithm is clearly correct. Assume T is consistent. First, observe 
that models generated in Step 7 are always generated from the Z-smallest to the Z- 
largest. Let be all the Z-minimal models of 7) ordered from the Z-smallest to 

the Z-largest according to an ordering (P^ j,...P^ J of Z. We will show by induction on 
0 <t <k that the fth model that J-Z-min outputs is M, . 

Base case: Follows from Lemma 3.2. 

Case t>0: Assume by contradiction that M VM, is the f ’th model that the algorithm 
outputs. By the induction hypothesis, it must be the case that IntfM^j) < IntfM’) < 

Let us look at the last time that Step 5 of the algorithm is exacuted just before 
model M’ is sent to output. 

Let i is the index that the algorithm finds at this last step. 

Let M* be an instantiation of the variables in T defined as follows: M*[n- 
l,...,i+l]=M^Jn-l,...,i+l], M*[i] = true, M*[i-l,...,n-r]={ta\st}. It is clear that 
IntfM^j) < IntfM*) < IntfM). There are 2 cases: 

1 . Int^(M^ j) <Int^(M’)< IntJJA*), then there is contradiction because in this case 
M’ must be an Z-super of (Some of the variables (P, P^J that were 
false become true instead), hence the algorithm will not output it. 

2. IntfM*) < IntfM’) < IntfM). Then the following must be true: 

2.1 i] = i] 

2.2 Int(M’[ ,n-r])<Int(Mfi-l , ...,n-rj). 
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But by Lemma 3.1 when we execute Find-X-minimal with X=(Pu,..., P^J it 
returns the smallest possible value of {P.^,..., P^J and therefore M’ cannot be a 
minimal model that satisfies 2. 1 & 2.2, a contradiction. 

It is left to show that is the last model sent to output. This is obvious because all 
the models generated after are not X-minimal and hence must be Z-super of at 
least one of all the Z-minimal models that are already in ModelsTable. □ 

Example 3.4 Consider again theory from Example 2.1, and suppose J-Z-min is 
called with X={P^...PJ. The first (and smallest) Z-minimal model returned by Find- 
Z-minimal is M =0010110. After that, at Step 5, we choose i=5 and call Find-Z- 
minimal (T -nP^,PJ, X,M)- Find-Z-minimal will return model M= 0100111, 
which is the 2”“* and last minimal model of T„. At Step 5 after that we choose i=6 and 
call Find-Z-minimal {T^ ® Z M). Find-Z-minimal will return model M_,= 

1100000. AA is an Z-super of M,, and therefore will not be sent to output. In the next 
iteration, no i will be found, and the algorithm will terminate. You can see that out of 
16 possible assignments to Z, only 3 models were considered by J-Z-min. 



4 Computing Circumscription 

In this section we will show how our algorithms can be used for computing 
circumscription. First, we will formulate deduction in circumscription in model- 
theoretic terminology. We will use propositional logic version of definitions from 
[GPP89,Li85,Mc80]. In this section we assume that T is some propositional theory 
and that there is a partition of all the atoms in T into three disjoint sets of atoms: P,Z, 
and Q. 

Definition 4.1 [GPP89]. For any two models M and Z of T we write M* N mod (P,Z) 
if models M and N differ only on how they interpret predicates from P and Z and if 
pos(MJ* P is a subset of pos(Z)* P. We say that a model M is (P,Z)-minimal if there is 
no model N such that N<M mod (P,Z) (i.e. such that N’ M but not M* N). 

That is, in order for M to be (P,Z)-minimal, the following must hold: for every 
model N such that M and N grant the same truth value to all the atoms in Q, the set 
of all atoms in P to which M assigns true must be a subset of the set of all atoms in P 
to which N assigns true; and we don’t care about the truth assignment these models 
give to atoms in Z. 

Theorem 4.2 [GPP89]. For any clause c, we say that c follows from (T,P,Z) by 
circumscription if and only if c is true in every (P,Z)-minimal model of T. 

Example 4.3 Assume T is the following theory, having the intuitive meaning that 
children normally like McDonald’s, and Itamar is a child: 

Child A — ^ LikesMD. 

Child. 

Suppose we want to know whether Itamar likes McDonald’s. The intuition is that 
the answer is yes. Using classical logic, LikesMD does not follow from this theory. 
However, taking P={Abj and Z={likesMDj we get that LikesMD follows from 
(T,P,Z) by circumscription . To see this, note that T has exactly three models: 
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Mi=j Child,Ab, LikesMD}M 2 ={ Child, Ab, ~<LikesMD},and Ms={ Child, ->Ab,LikesMD}. 
Mj is the only fP,Zj- minimal model of T. 

The algorithms developed here can be used for a demand-driven computation of 
(P,Zj-minimal models of T, in the following way: 

1. Use some backtracking algorithm to find all consistent (with T) truth 
assignments for the variables in Q 

2. For each assignment v found in step 1, compute one by one the F-minimal 
models of T U Lit(v). Each model generated is a (P,Zj-minimal model of T. 

A demand-driven computation is useful here because it may help us refute 
conclusions before all the models are generated (if a fact does not follow from some 
model it certainly does not follow from all of the models). 




Fig. 6. Growing X size 




5 Experiments 

We have tested algorithm J-X-min algorithm on a suite including hard randomly 
generated CNF problems (theories). The problems are 3CNF difficult random 
problems as describe in [SKC94] and [SK96]. The Algorithm was tested on theories 
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of size 50/218 (50 symbols, 218 clauses). The theories were taken from the SATLIB 
[SATOO]. The algorithms were implemented in JAVA on a PC having Pentium 3 600 
MH processor and 128 MB memory. We have chosen JAVA because we had the 
intention of building an object-oriented library of tools for computing minimal 
models. We used a JAVA code of walksat [SKC94] as a (incomplete) SAT procedure. 
Since JAVA is a relatively ineffcient language in terms of running time, we did not 
pay much attention to the absolute running time of the algorithms in these 
experiments. However, we do believe that running time is an important factor and we 
plan to implement the algorithms in C in order to improve their time performance. 

We have ran three experiments: 

1. Compute all the minimal model of the theories and compare the results to 
results of a complete procedure (the dlv system [KL99] ). 

2. Check the growth in run-time as a function of an increasing size of X. 

3. Compute all the minimal models using different symbols order heuristics. 

The first experiment has shown that inspite of using incomplete sat algorithms, we 
have succeeded in computing all and only the minimal models of the theories. The 
set of minimal models computed by our algorithm was exactly the same set of 
minimal models computed by dlv. We expect though that on much larger theories an 
incomplete algorithm will be less accurate. 

Results of the 2”“* experiment are shown in Figure 6. As expected, the run-time of 
the algorithm (given in seconds) is growing as X grows. It is encouraging to see that 
the first V-minimal model is generated in about 25% of the time it takes to compute 
all the models, since one of the goals of this project was to output the models on a 
demand-driven basis. 

The ordering of theory variables before calling algorithm J-V-min might be crucial. 
Once enough instantiations are made so that the theory is Horn, a linear algorithm can 
be called upon to finish the minimal model computation. In the 3'“* experiment we 
have computed V-minimal models where X is the set of all variables in the theory. On 
each theory, we have tested the J-X-min five times, four times with random symbol 
order, and one time with symbol order where the vertex cover of the theory is first in 
the ordering. In general, the problem of finding a minimum-cardinality vertex cover 
of a graph is NP-hard. A greedy heuristic procedure that we used for finding a vertex 
cover simply removes the node with maximum degree from the graph and continues 
with the reduced graph until all nodes are disconnected. The set of all nodes removed 
is a vertex cover. 

We have compared the run-time results of J-X-min in vertex cover order, and in 
random order. For the random order, we took the best, worst, and mean run time. We 
have divided the results according to the vertex cover of the theory. The results of 
this experiment are summarized in Figure 7 (run time is in seconds). We can see that 
in general the run time of J-X-min does not grow as the size of the vertex cover 
grows. We explain this findings by the fact that we use the walksat algorithm. When 
the walksat algorithm is checking the satisfiability of an inconsistent theory, it stops 
after a time-out (measured by number of flips and restarts). This time-out is almost 
constant and hence the run time of J-X-min is more or less constant on theory size 
with different vertex cover size. We can see that the vertex cover heuristics is quite 
good, always better than the worst and mean run-time of the J-X-min with random 
order, and usually even better than the best. 




332 



Chen Avin and Rachel Ben-Eliyahu - Zohary 



6 Related Work 

During the last few years there have been several studies regarding the problem of 
minimal model computation. Ben-Eliyahu and Dechter [BD96] have presented 
several algorithms for computing minimal models, all of them different from the ones 
proposeded here. One limitation of the algorithms presented there is that they produce 
a superset of all minimal models while we produce the minimal models one by one. 
Ben-Eliyahu and Palopoli [BP97] have presented a polynomial algorithm for finding 
a minimal model, but it works only for a subclass of all CNF theories and it finds only 
one minimal model. 

The systems dlv [KL99] and smodels [JNSOO] compute stable models of 
disjunctive logic programs. If integrity constraints are allowed in the programs, every 
knowledge base can be represented as a disjunctive logic program such that the set of 
all minimal models of the first coincide with the set of all stable models of the second. 
An advatage of our approach compares to theirs is that our algorithms compute X- 
minimal models one at a time while using their approach we have to compute first all 
minimal models and then select the set of A-minimal ones. Another difference is that 
our implementaion uses local search techniques. 



7 Conclusions 

The task of computing A-minimal models is very relevant in many knowledge 
representation systems, and particularly in Diagnosis and Circumscription. We have 
presented two new algorithms to perform this task. The algorithms are demand 
driven, and can be implemented either by using incomplete local search procedures or 
by using complete procedures. In the future we plan to combine the algorithms 
presented here with the algorithm developed by [BeOO] in order to produce a 
distributed algorithm for computing A-minimal models. 
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Appendix 

In the following text, unless otherwise is stated, we assume some fixed theory T and 
some fixed subset X of all atoms in T, where |Z| =x 

We take two truth assignmnets to be different only if they disagree on the set of 
atoms X. 

The question we want to raise is: What is the maximum number of different X- 
minimal models that such a theory T may have? 

Definition 1: An assignments (truth assignments) X-chain (or Z-chain in short) is an 
ordered set of assignments where each assignment is an X-super of the next 
assignment. 

Definition 2: an Z-chain Set is a set of Z-chains such that each possible truth 
assignment to Z appears in exactly one X-chain. So there are exactly 2^x assignments 
in all the Z-chains all together. 

We define C,, as chains set that contains exactly r assignments chains. 

Lemma 1: For any Theory T and set of atoms in T, Z, such that T has a total of j 
different Z-minimal models, and for any X-chain set for T, r > j. 
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Proof: We will prove by contradiction that j can’t be larger than r. It is obvious that 
two different A-minimal models of T must belong to a different X-chain of (one 
A-minimal model can’t be a Super of another A-minimal model by definition and 
therefore can’t be in the same X-chain). If j > r then there must be two different X- 
minimal models that belong to the same X-chain in C,. A contradiction. 

Lemma 2: If there is a theory T having exactly j X- minimal models and an X-chain 
set Q where r=j then for any theory 7” and for any set of atoms X’ in T’ such that 
|Z’| = |A|, 7” may have at most 7 Z’-minimal models. 

Proof: It is easy to see that since T has an Z-chain set of size r, T’ must also have an Z’- 
chain of size r. Assume that T’ has j’ Z’-minimal models with j’>j. By Lemma 1, r > j’. But 
we also know that r=j, so we get thaty > j’. A contradiction. 



Theorem: The maximum number of Z-minimal models of a theory is 




Proof: First, we will show that there is some theory T having exactly ^ Z-minimal 

II2JJ 

models for some subset Z of all the atoms in T with |Z|=m. We will define T to be the theory 



that has exactly ^ models where each model has a different set of — tme atoms that 

2 

vL^ Jy 

belong to some fixed set Z of atoms with |Z|=n, while all the other atoms in the model are 
assign with false. In this case each model is also an x-minimal model. 



Next we will show that there is an X-chain set of size ^ . This will complete the 

. 2 , 



proof because it means (according to Lemma 2) that this is the maximum number of Z-minimal 
models that a theory may have. We will divide the 2“ different assignments to Z into the 
following sets which reflect the number of atoms in Z assign true by the assignment: 




We will build the chains in the set as follows. We start with 




chains, each having one 



assignment that belongs to the set 




We then add the assignments in the set 




existing chains, possibly starting a new chain , and so on. The Z-chains are growing by 
creating complete matching in bipartite graphs where the set of vertices V is the union of the 
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assignments from 




and the assignments from 




The edges connecting vertices 



in this graph reflect the X-super relation and we can show that in this case we can find a 
complete matching. 
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Abstract. In this paper, we continue to explore many-valued disjunc- 
tive logic programs with probabilistic semantics. In particular, we newly 
introduce the least model state semantics for such programs. We show 
that many-valued disjunctive logic programs under the semantics of min- 
imal models, perfect models, stable models, and least model states can 
be unfolded to equivalent classical disjunctive logic programs under the 
respective semantics. Thus, existing technology for classical disjunctive 
logic programming can be used to implement many-valned disjunctive 
logic programming. Using these results on nnfolding many- valuedness, 
we then give many-valued fixpoint characterizations for the set of all 
minimal models and the least model state. We also describe an iterative 
fixpoint characterization for the perfect model semantics under finite 
local stratification. 



1 Introduction 

In a previous paper [-5], we introduced many- valued disjunctive logic programs 
with probabilistic semantics. In particular, we defined minimal, perfect, and sta- 
ble models for such programs, and showed that they have the same properties 
like their classical counterparts. For example, perfect and stable models are al- 
ways minimal models. Under local stratification, the perfect model semantics 
coincides with the stable model semantics. Moreover, we also showed that some 
special cases of propositional many-valued disjunctive logic programming under 
minimal, perfect, and stable model semantics have the same complexity as their 
classical counterparts. 

In this paper, we continue this line of research on many-valued disjunctive 
logic programming with probabilistic semantics. The central topic of the present 
paper is to elaborate algorithms for many-valued disjunctive logic programming. 
One way of obtaining such algorithms is to translate many-valued disjunctive 
logic programs into classical formalisms, and to work with existing algorithms 
for the classical formalisms. Another way is to simply develop completely new 
algorithms. 

In this paper, we follow both directions. We first show that many-valued dis- 
junctive logic programs under minimal models, perfect models, stable models. 
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and least model states can be unfolded to equivalent classical disjunctive logic 
programs under the respective semantics. Thus, existing technology for classical 
disjunctive logic programming can be used to implement many-valued disjunc- 
tive logic programming. 

Using these results on unfolding many- valuedness, we then develop new 
many-valued fixpoint characterizations for the semantics of minimal models, 
least model states, and perfect models under finite local stratification. 

It is important to point out that our many-valued disjunctive logic programs 
have a probabilistic semantics in probabilities over possible worlds. Furthermore, 
the truth values of all clauses are truth-functionally defined on the truth val- 
ues of atoms. This gives our many- valued disjunctive logic programs both nice 
computational properties (compared to purely probabilistic approaches) and a 
nice probabilistic semantics. The latter is expressed in the fact that our many- 
valued disjunctive logic programming under the minimal model and the least 
model state semantics is an approximation of purely probabilistic disjunctive 
logic programming. 

We showed in [6,7] that many-valued definite logic programming with this 
probabilistic semantics has a model and fixpoint characterization and a proof 
theory similar to classical definite logic programming. Moreover, special cases 
of many- valued logic programming with this semantics were shown to have the 
same computational complexity as their classical counterparts. Interestingly, our 
approach in [6,7] is closely related to van Emden’s quantitative deduction [19], 
which interprets the implication connective as conditional probability, while our 
work uses the material implication. 

The main contributions of this paper can be summarized as follows. 

• We introduce the least model state semantics for positive many-valued dis- 
junctive logic programs with probabilistic semantics. 

• We show that many-valued disjunctive logic programs under minimal model, 
perfect model, stable model, and least model state semantics can be unfolded 
to equivalent classical disjunctive logic programs under the respective seman- 
tics. 

• We provide fixpoint characterizations for the set of all minimal models and 
the least model state of positive many-valued disjunctive logic programs. 

• We describe an iterative fixpoint characterization for the perfect model of 
many-valued disjunctive logic programs that have a finite local stratification. 

Note that proofs of all results are given in the extended paper [8]. 

2 Preliminaries 

In this section, we recall some necessary definitions and results from [.5]. 

2.1 Probabilistic Background 

Let ^ be a first-order vocabulary that contains a set of function symbols and 
a set of predicate symbols (as usual, constant symbols are function symbols of 
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arity zero). Let X he a set of variables. We define terms by induction as follows. 
A term is a variable from X or an expression of the form /(ti, . . . where / 
is a function symbol of arity fc > 0 from and ti, . . . ,tk are terms. We define 
classical formulas by induction as follows. If p is a predicate symbol of arity 
fc > 0 from ^ and ti,. . ,,tk are terms, then p{ti, . . . ,tfc) is a classical formula 
(called atom). If F and G are classical formulas, then also and (FAG). 
Literals, positive literals, and negative literals are defined as usual. We define 
probabilistic formulas inductively as follows. If is a classical formula and c is a 
real number from [0, 1], then prob(i^) > c is a probabilistic formula (called atomic 
probabilistic formula). If F and G are probabilistic formulas, then also ^F and 
(FAG). We use (FV G) and {F ^G) to abbreviate ~^{^F A ~^G) and ~^{^F AG), 
respectively, and adopt the usual conventions to eliminate parentheses. Terms 
and formulas are ground iff they do not contain any variables. Substitutions, 
ground substitutions, and ground instances of formulas are defined as usual. 

A classical interpretation / is a subset of the Her brand base HB,p over 
A variable assignment a assigns to each x G X an element from the Herbrand 
universe HU <p over <P. It is by induction extended to terms by a{f(ti , . . . , tk)) = 
f{<j{ti ), . . . , a{tk)) for all terms f{ti, . . . ,tk). The truth of classical formulas F 
in / under cr, denoted I \=„ F, is inductively defined as follows (we write I \= F 
when F is ground): 

• I [=a- pih, ■■■Uk) iff p{o{ti), ..., Cf{tk)) G I. 

• I \=a ~^F iff not I \=a F, and I \=a {F A G) iff / )=„ F and I \=a G. 

A probabilistic interpretation (or p-interpretation) p = (X, p) consists of a set X 
of classical interpretations (called possible worlds) and a discrete probability 
function /i on X (that is, a mapping p from X to the real interval [0, 1] such that 
all p{I) with I G T sum up to 1 and that the number of all / G X with p{I) > 0 
is countable). The truth value of a formula F in a p-interpretation p under a 
variable assignment cr, denoted p^{F), is defined as the sum of all p{I) such that 
I GX and I \=a F (we write p{F) when F is ground). The truth of probabilistic 
formulas F in p under a, denoted p \=a F, is defined as follows (we write p\= F 
when F is ground): 

• p\=a prob(F) > c iff p<,.(F) > c. 

• p \=cr ~^F iff not p \=a F, and p \=a (F A G) iff p \=a F and p \=a G. 

The probabilistic formula F is true in p, or p is a model of F, denoted p |= F, 
iff F is true in p under all variable assignments a. The p-interpretation p is a 
model of a set of probabilistic formulas F, denoted p |= F, iff p is a model of all 
F gF. A set of p-interpretations P is a model of F (resp., F), denoted P |= F 
(resp., P 1= F), iff every member of P is a model of F (resp., F). 

2.2 Positively Correlated Probabilistic Interpretations 

We restrict our attention to the following kind of p-interpretations (that is, we 
assume another axiom besides the axioms of probability). A positively correlated 
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probabilistic interpretation (or pep-interpretation) is a p-interpretation p such 
that 



p{A A B) = mm{p{A) , p{B)) for all A,Bg HB^, . (1) 

Note that the condition p{A A B) = min(p(^),p(i3)) is just assumed for 
ground atoms A and B. It brings probabilistic logics over possible worlds closer 
to truth- functional logics. We do not assume that (1) always holds in the part 
of the real world that we want to model. The axiom (1) is simply a technical 
assumption that carries us to a form of many-valued logic programming that 
approximates probabilistic logic programming. It makes a global probabilistic 
semantics over possible worlds match with the truth-functionality behind logic 
programming techniques. Differently from many other axioms, the axiom (1) is 
compatible with logical implication. Note that pcp-interpretations are uniquely 
determined by the truth values they give to all ground atoms [5] , and thus they 
can be identified with mappings from HB^ to [0, 1]. 

A probabilistic formula A is a pc- consequence of a set of probabilistic formulas 
IF, denoted T F, iff each pcp-interpretation that is a model of T is also a 
model of F. 

2.3 Many- Valued Disjunctive Logic Programs 

We are now ready to define many- valued disjunctive logic programs. We start 
by defining many-valued disjunctive logic program clauses, which are special 
atomic probabilistic formulas that are interpreted under pcp-interpretations. A 
many-valued disjunctive logic program clause (or mvd-clause) is a probabilistic 
formula of the kind 



prob(Ai V • • • V A; ^ A • • • A Bm A -•C\ A • • • A ^C„) > c , 

where Ai , . . . , Ai,Bi , . . . , Bm, C'l, . . . , are atoms, l,m,n > 0, and c G [0, 1] is 
rational. It is abbreviated by (Ai V • • • V Aj ^ Bi, . . . , Bm,not Ci, . . . , 
not C„)[c, 1]. We call Ai V • • • V A; its head, Bi , . . . , B^, not Ci, . . . , not its 
body, and c its truth value. It is positive (resp., definite) iff n = 0 (resp., I = 1 and 
n = 0). It is called an integrity clause iff Z = 0, a fact iff Z > 0 and m -I- n = 0, and 
a rule iff Z > 0 and to -|- n > 0. A many-valued disjunctive logic program (or mvd- 
program) P is a finite set of mvd-clauses. A positive (resp., definite) mvd-program 
is a finite set of positive (resp., definite) mvd-clauses. Given an mvd-program P, 
we identify <P with the vocabulary <P{P) of all function and predicate symbols in 
P. Denote by HB p the Herbrand base over d>{P), and by ground{P) the set of 
all ground instances of members of P w.r.t. d>{P). The set of truth values of P, 
denoted TV{P), is the least set of rational numbers . . . , that 

contains all the rational numbers in P, where n > 2 is a natural number. Denote 
by Ip the set of all pcp-interpretations over HBp into TV{P). 

The following result shows that the truth of a ground mvd-clause under 
a pcp-interpretation is a function of the truth values of the contained ground 
atoms. 
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Theorem 2.1. Let C = {Ai V • • • V A; <— Bi, . . . , Bm, not Ci, . . . , not C„)[c, 1] 
be a ground mvd-clause, and let p be a pep-interpretation. Then, p is a model 
of C iff 

max{ max p{Ai), max p{Ci)) > c— 1+ min p{Bi) . 

1<2<Z l<i<n l<2<m 

We finally define queries and their correct and tight answers. A many-valued 
query (or simply query) is an expression 1], where is a ground classical 

formula and t is a variable or a rational number from [0,1]. Given the queries 
3(F) [c, 1] and 3(F) [x, 1] to an mvd-program P, where c G [0, 1] and xGX, we 
define their desired semantics in terms of correct and tight answers with respect 
to a set M{P) of models of P as follows. The correct answer for 3(F) [c,l] 
to P under M{P) is Yes if c< inf{p(F) \ pG M{P)} and No otherwise. The 
tight answer for 3(F)[x, 1] to P under M{P) is the substitution 9 = {x/d}, 
where d = inf{p(F) | p G M(P)}. 

In the rest of this subsection, we recall minimal, perfect, and stable models 
from [5] as some ways of describing the meaning of an mvd-program. 

Minimal Models. For pcp-interpretations p and q, we say p is a subset of q, 
denoted p C q, iff p{A) < q{A) for all A G HB^. We use p C q as an abbreviation 
for p ^ q and p q. A model p of an mvd-program F is a minimal model of 
P iff no model of F is a proper subset of p. Denote by MM{P) the set of all 
minimal models of F. 

Perfect Models. We first define the two relations -< and ^ on ground atoms. 
For an mvd-program F, the priority relation ^ and the auxiliary relation ^ are 
the least binary relations on LlBp with the following properties. If ground (P) 
contains an mvd-clause with the atom A in the head and the negative literal 
note in the body, then A ^ C. If ground(P) contains an mvd-clause with the 
atom A in the head and the positive literal B in the body, then A F F. If 
ground(P) contains an mvd-clause with the atoms A and A! in the head, then 
A F A'. If A ^ F, then A ^ F. If A ^ F and B < C, then A ^ C. If A ^ F 
and B < C, then A ^ C. If A ^ F and B < C, then A < C. We say that the 
ground atom F has higher priority than the ground atom A iff A ^ F. 

We next define the preference relation on pcp-interpretations as follows. 
For pcp-interpretations p and q, we say p is preferable to q, denoted p q, 
iff p 7 ^ q and for each A G HB p with p(A) > q(A) there is some F G HB p with 
q(B) > p{B) and A ^ B. We write p q iff p ^ q or p — q. 

A model q of an mvd-program F is a perfect model of F iff no model of F is 
preferable to q. We use PM{P) to denote the set of all perfect models of F. 

Not every mvd-program has a perfect model. We next define locally stratified 
mvd-programs without integrity clauses, which always have a perfect model. 

An mvd-program F without integrity clauses is locally stratified iff HB p can 
be partitioned into sets Fi, F 2 , . . . (called strata) such that for each mvd-clause 

(Ai V • • • V A/ <— Fi, . . . , Bm, not C \, . . . , not C'„)[c, 1] G ground{P ) , 



Fixpoint Characterizations for Many- Valued Disjunctive Logic Programs 341 



there exists an f > 1 such that all Ai, . . . , A; belong to Hi, all Bi, , Bm belong 
to i?i U • • • U Hi, and all C\, . . . ,Cn belong to i?i U • • • U For such a 

partition Hi, H 2 , ... of HB p (called a local stratification of P) and every i>l, 
we use Pi to denote the set of all mvd-clauses from ground (P) whose heads 
belong to Hi. 

Stable Models. An extended many-valued disjunctive logic program clause 
(or emvd-clause) is an expression {A\ V • • • V A; ; d ^ Bi, ... , Bm, not Ci, . . . , 
not Cn)[c, 1], where Ai, . . . , Ai, Bi, . . . , Bm, C\, . . . ,Cn are atoms, l,m,n>0, cG 
[0, 1] is rational, and d G [0, 1]. It is true in a pcp-interpretation p under a variable 
assignment a iff 

max(max p^(Ai), max p^{Ci),d) > c - 1 -|- min p^{Bi) . 

l<i<l l<i<n l<i<m 

Thus, emvd-clauses may also contain truth-value constants in their heads. 

For an mvd-program P and a pcp-interpretation q, the expression P/q de- 
notes the set of emvd-clauses that is obtained from ground{P) by replacing 
every mvd-clause (Ai V ■ ■ ■ y Ai ^ Bi, . . . , Bm, not Ci,. . . , not C'„)[c, 1] by the 
emvd-clause 



(Ai V • • • V A; ; max q{Ci) ^ Bi, . . .,Bm)[c,l] ■ 

l<i<n 

A pcp-interpretation q is a stable model of an mvd-program P iff q is a 
minimal model of P/q. We use SM{P) to denote the set of all stable models 
of P. 

2.4 Example 

We now give an illustrative example. The following mvd-program P is taken 
from [5] (r, s, a, b, and c are constant symbols, and R, X, Y, and Z are variables): 

P = {{closed{r)\/ closed{s) ^)[.5, 1], {road{r, a, b) ^)[. 8 , 1], {road{s, b, c) <— )[.7, 1], 
(reach{X,Y) ^ road{R, X,Y), not closed{R))[.9,l], 

{reach{X, Z) ^ reach{X,Y),reach{Y, Z))[.9,l]} . 

The set of truth values of P is given by TV (P) = {0, 0.1, . . . , 1}. 

A query to P may be given by 3(reac/i(a, c))[C/, 1], where 17 is a variable. 
To determine its tight answer, we must specify a set of models of P. Some 
models Pi, P 2 , P 3 , and p^ of P are shown in Table 1 (we assume Pi{A) = 0 for 
all unmentioned Ag HBp). More precisely, the models Pi, P 2 , P 3 , and P 4 are 
some minimal models of P, whereas the models Pi and P 2 are the only perfect 
and stable models of the locally stratified mvd-program P. The tight answer for 
3(reac/i(a,c))[t/,l] to P under {Pi,P 2 ,P 3 ^P 4 } and {^ 1 ,^ 2 } is given by {17/0} 
and {17/0.5}, respectively. 
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Table 1. Some models of the mvd-program P 



closed (r) 


closed (s) 


road(r, a, b) 


road{s, b, c) 


reach{a, b) reachfb, c) 


reach{a, c) 


Pi 0.5 


0 


0.8 


0.7 


0.7 


0.6 


0.5 


P2 0 


0.5 


0.8 


0.7 


0.7 


0.6 


0.5 


Ps 0 


0.6 


0.8 


0.7 


0.7 


0 


0 


Pa 0 


0.7 


0.8 


0.7 


0 


0 


0 



3 Least Model States 

We now define least model states for positive mvd-programs, which are a gener- 
alization of their classical counterparts by Minker and Rajasekar [12,4]- 

In the sequel, we use to abbreviate atomic probabilistic formulas of the 
form prob(A) > a . Given an mvd-program P, the disjunctive Herbrand base 
for P, denoted DHBp, is the set of all disjunctions of atomic probabilistic for- 
mulas V • • • V with pairwise distinct ground atoms Ai, . . . , G HBp, 
ai, . . . , Ofc S TV (P)\{0}, and fc > 1. A disjunctive Herbrand state (or state) S is 
a subset of DHB p . A state 5" is a model state of a positive mvd-program P iff 

{DgDHBp \SVP'^p‘^ D}QS. 

A model p of a state S' is a minimal model of S iff no model of S is a proper 
subset of p. We use MM{S) to denote the set of all minimal models of S. The 
canonical form (resp., expansion) of a state S, denoted can{S) (resp., exp{S)), 
is defined by: 



can{S) = {DgS \ VD' gS, D' ^ D: {D'} D} , 
exp{S) = {DgDHBp \3D'gS: {D'} \=p'= D} . 

A state S is in canonical form (resp., expanded) iff S = can{S) (resp., S = 
exp{S)). 

The following theorem shows that the intersection of a set of model states of 
a positive mvd-program P is also a model state of P. 

Theorem 3.1. Let P be a positive mvd-program, and let S be a set of model 
states of P . Then, the intersection of all S G S is a model state of P. 

Clearly, each positive mvd-program P has the model state DHBp. Thus, 
there exist model states of P, and the intersection of all of them is the least 
model state of P. 

Definition 3.2. Denote by MS p the least model state of a positive mvd-pro- 
gram P. 

The following result shows that MS p is the set of all disjunctions D G DHB p 
that are pc-consequences of P. Moreover, it shows that this set coincides with 
the set of all disjunctions D G DHBp that are true in all minimal models of P. 

Theorem 3.3. Let P be a positive mvd-program. Then, 
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(a) MSp = {D^DHBp \ P D). 

(b) MSp = {£> e DHBp I MM{P) ^ D}. 

As shown in [6,7], definite mvd-programs P have a unique least model 
Mp. The next theorem shows that for such P, the model Mp corresponds to 
can{MS p). 

Theorem 3.4. Let P he a definite mvd-program, and let Mp he the least model 
of P . Then, ean{MSp) = Sp where Sp = ^ DHB p \ a = Mp{A)} . 

We give an illustrative example. 

Example 3.5. Consider the following positive mvd-program P\ 

P = {{elosed{r)\/ elosed{s) ^)[.5, 1], {roadfr, a, h) ^)[.8, 1], {road{s, b, c) <— )[-7, 1], 
{reaeh{X ,Y)\/ closed{R) ^ road{R, X,Y))[.9,1], 

{reaeh{X, Z) ^ reach{X,Y), reaeh{Y, Z))[.9,l]} . 

The set of truth values of P is given by TV (P) = {0, 0.1, . . . , 1}. The canonical 
form of the least model state MSp of P is given as follows: 

ean{MS p) = {closed^'^{r)\/ closed^'^{s), road^'^{r, a,b), road^ '^{s,b,c), 
reaeh^ '^{a, b)\J closed^ fr), reach^'^{b, c)VcZosed° ®(s), 
reaeh^'^{a, c)\/ closed^ '^ {r)\/ closed^'^{s)} . 

4 Unfolding Many- Valuedness 

In this section, we give translations of mvd-programs under the semantics of min- 
imal models, perfect models, stable models, and least model states into classical 
disjunctive logic programs under the respective classical semantics. 

4.1 Program Translations 

We now formally define translations of mvd-programs and pcp-interpretations 
into classical disjunctive logic programs and classical interpretations, respec- 
tively. 

Given an mvd-program P, the many-valued alphabet for P, denoted 
is obtained from ’P{P) by replacing each predicate symbol p by the new predicate 
symbols with a G TV (P)\{0}. The many-valued Herbrand base for P, denoted 
HBf^, is the Herbrand base over <P'^{P). For atoms A = p(fi, . . . ,tk) and a G 
TV{P), the atom A“ over is defined as . . . ,tfe). 

Every mvd-program P is translated into the following classical disjunctive 
logic program Tr(P) = Tri(P) U Ti' 2 (P) over <P^{P) (based on Theorem 2.1): 

Tn (P) = { A? V • • • V A“ ^ Pf\ . . . , , not Cf , . . . , not | 

(Ai V • • • V A/ ^ Pi, . . . ,Pm, not Cl,..., not C„)[c, 1] G P, 
(3i,...,Prn& TV{P), a = c - 1 -I- min(/3i, . . . ,/?m) >0} , 

Tr2(P) = {A“ ^ Ai5 I A“, Ah g PP^, a < (3} . 
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Every pcp-interpretation p is translated into the following classical interpreta- 
tion: 

Tr(p) = G HB^ I p{A) > a} . 

The following example illustrates the above program translation. 

Example 4.1. The mvd-program P given in Section 2.4 is translated into the 
classical disjunctive logic program Tr(P) = Tri(P) U Tr 2 (P), where Tri(P) is 
given by: 

Tri(P) = {closed^'^{r)\/ closed^'^{s) <— ; road^'^{r,a,b) ^ ; road^ '^{s,b,c) ; 

reach^'^{X, Y) <— road^'^{R, X, Y), not closed^'^(R); 
reach^'"^{X, Y) <— road^'^{R, X, Y), not closed^ '^{R ); . . . ; 
reach^'^{X, Y) <— road^{R, X, E), not closed^'^{R); 
reach^'^{X, Z) <— reachP'^{X, Y), reach^ '^{Y, Z)\ 
reach^'^{X, Z) ^ reach’^'^{X, Y), reach^'^{Y, Z); 
reach^'^{X, Z) ^ reach^'^{X, Y), reach'^ '^{Y, Z); 
reach^ '^{X, Z) ^ reach^'^{X, Y), reach^'^{Y, Z ); . . . ; 
reach^'^{X, Z) ^ reach^{X, T), reach^{Y, Z)} . 

Note that Tri (P) may be quite large. It generally has a manageable size when 
there are few truth values in TV{P) and few positive literals in the bodies of 
clauses in P. 

4.2 Unfolding Results 

Minimal Models. The following lemma shows that every mvd-program P is 
equivalent to its translation Tr(P), under all pcp-interpretations into TV{P). 

Lemma 4.2. Let P be an mvd-program, and let p be a pep-interpretation into 
TV{P). Then, p is a model of P iff p is a model o/Tr(P). 

The next lemma shows that pcp-interpretations p into TV (P) can be identi- 
fied with their translation Tr(p), concerning classical disjunctive logic programs 
over HBff . 

Lemma 4.3. Let P be an mvd-program. Let L be a classieal disjunctive logic 
program over the alphabet <P'^{P), and let p be a pcp-interpretation into TV{P). 
Then, p is a model of L iffTr{p) is a model of L. 

The following theorem shows that Tr translates mvd-programs under the 
minimal model semantics into equivalent classical disjunctive logic programs 
under the minimal model semantics. It can be proved using the two lemmata 
above. 

Theorem 4.4. Let P be an mvd-program, and let p be a pcp-interpretation. 
Then, p is a minimal model of P ijfTr(p) is a minimal model o/Tr(P). 
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Perfect Models. The alphabet is obtained from <P{P) by replacing each 

predicate symbol p by the new predicate symbols with a € TV{P). 

We slightly modify the translation of mvd-programs and pcp-interpretations 
as follows. Every mvd-program P is translated into the following classical dis- 
junctive logic program Tr*(P) = Tr(P) U Tr3(P) over <P'^{P)-. 

Tr3(P) = {A° W ^ \AeHBp}U{A°^ \AeHBp}. 

Every pcp-interpretation p is translated into the following classical interpreta- 
tion: 

Tr*(p) = Tr(p) U | A S BB p} . 

Roughly speaking, the next lemma shows that pcp-interpretations p into 
TV{P) can be identified with their translation Tr*(p). 

Lemma 4.5. Let P be an mvd-program. Let L be a classieal disjunctive logic 
program over the alphabet and let p be a pcp-interpretation into TV{P). 

Then, p is a model of L iffTP^p) is a model of LU Tr3(P). 

The following theorem shows that Tr* translates mvd-programs under the 
perfect model semantics into equivalent classical counterparts. 

Theorem 4.6. Let P be an mvd-program, and let p be a pcp-interpretation. 
Then, p is a perfect model of P ijfTr*{p) is a perfect model o/Tr*(P). 

The following theorem shows that the translation Tr(P) of a locally stratified 
mvd-program P is also locally stratified. 

Theorem 4.7. Let P be an mvd-program. Lf P is locally stratified, then also 
Tr(P). 

The next theorem shows that Tr translates locally stratified mvd-programs 
under the perfect model semantics into equivalent classical counterparts. 

Theorem 4.8. Let P be a locally stratified mvd-program, and let p be a pcp- 
interpretation. Then, p is a perfect model of P iff Tr(p) is a perfect model of 
Tr(P). 

Stable Models. For classical disjunctive logic programs L and classical in- 
terpretations I, denote by LfL the classical Gelfond-Lifschitz transform of L 
w.r.t. I. 

The next lemma shows that for mvd-programs P and pcp-interpretations q, 
the transform P/q is equivalent to Tr(P)/Tr(q), under all pcp-interpretations 
into TV{P). 

Lemma 4.9. Let P be an mvd-program, and let p and q be two pcp-interpre- 
tations into TV{P). Then, p is a model of P/q iff p is a model ofTr{P)/Tr(q). 

The next theorem shows that Tr translates mvd-programs under the stable 
model semantics into equivalent classical counterparts. 

Theorem 4.10. Let P be an mvd-program, and let p be a pcp-interpretation. 
Then, p is a stable model of P ijff Tr(p) is a stable model o/Tr(P). 
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Least Model States. The following lemma shows that every mvd-program P 
is equivalent to its translation Tr(P), concerning disjunctive Herbrand states. 

Lemma 4.11. Let P be an mvd-program, and let S be a state. Then, S is a 
model state of P iff S is a model state o/Tr(P). 

The following theorem shows that Tr translates an mvd-program into a clas- 
sical counterpart that has the same least model state. 

Theorem 4.12. Let P be an mvd-program, and let S be a state. Then, S is the 
least model state of P iff S is the least model state o/Tr(P). 

5 Fixpoint Characterizations 

In this section, we provide many-valued fixpoint characterizations for the seman- 
tics of minimal models, least model states, and perfect models under finite local 
stratification. 

5.1 Minimal Models for Positive Programs 

We now give a fixpoint characterization for the set of all minimal models of 
a positive mvd-program, which is a generalization of the classical counterpart 
given in [3,18]. 

In the sequel, let P be a positive mvd-program. The canonical form (resp., 
expansion) of a set of pcp-interpretations P, denoted can{P) (resp., exp{P)), is 
defined by: 

can{P) = {p G P I G P : q C p} , 
exp{P) = {p G Ip \3qG P: q C p} . 

We say P is in canonical form (resp., expanded) iff P = can(P) (resp., 
P = exp{P)). 

The fixpoint operator is defined on the complete lattice (f,E), where £ is 
the set of all expanded sets of pcp-interpretations, and P C Q iff Q D P for all 
P,Q g£. The bottom element _L is the set of all pcp-interpretations, and the 
top element T is the empty set. The greatest lower bound of any subset of 
elements is the union of the elements in the set, and the least upper bound is 
the intersection of the elements. 

The operator T^ on expanded sets of pcp-interpretations P is defined by: 
Tp{P) = {models p{state p {p)) \ p G P}, 

where state p and modelsp are given as follows: 

statep{p) = {Afy ■ ■ -M Af \ {Ai\J ■ ■ Ai ^ B\, . . . , Bm)[c,V\Gground{P), 

a = c-l-\- min(p(Pi), . . .,p{Bm)) > 0} , 
modelsp{S) = {q G Ip \ q \= S, q f) p} . 

The next lemma shows the immediate result that T^ is monotonic. 
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Lemma 5.1. is monotonic. 

We now define the powers oiT^ . For every expanded set of pcp-interpre- 
tations P: 

(P if a = 0; 

T^'l a{P) = < T^(T^t (a— 1)(P)) if a > 0 is a successor ordinal; 

P{P) I /3 < ct} if a > 0 is a limit ordinal. 

As usual, we use a to abbreviate ci(-L). 

The following lemma shows that the operator is not continuous. This 
result is immediate by the fact that the classical counterpart of is not con- 
tinuous [18]. 

Lemma 5.2. is not continuous. 

Even though the operator is not continuous, its least fixpoint is attained 
at the first limit ordinal. This is shown by the following theorem, which follows 
from a similar result for classical disjunctive logic programs [18]. 

Theorem 5.3. lfp{T^) = T^^uj. 

The next theorem shows that the set of minimal models of P is given by the 
canonical form of the least fixpoint oiT^ . 

Theorem 5.4. MM{P) = can{lfp{T^)) . 

5.2 Least Model States for Positive Programs 

We now give a fixpoint characterization for the least model state of a posi- 
tive mvd-program, which is a generalization of the classical counterpart given 
in [12,4]. 

In the sequel, let P be a positive mvd-program. We now identify every dis- 
junction D € DHB p with the set of all contained atoms A“ G HB'Ji. 

The operator Tp on expanded disjunctive Herbrand states S is defined by: 

Tp{S) = ea;p({A“V • • • VA“VPiV • • • V£>m 1 Pi, . . . e PPPp, 

(AiV • • • yAi ^ Pi A • • • A Pm)[c, 1] G ground(P), 

B^^y Di, . . . , B^y Dm G S, a = c-H- min(/3i, . . . , /3„)>0}) . 

The following lemma shows that the model states of P correspond exactly 
to the pre-fixpoints of the operator Tp. 

Lemma 5.5. Let S be an expanded state. Then, S is a model state of P iff 
Tf{S) C S. 

The next lemma shows that the operator Tf is continuous. This result follows 
immediately from the continuity of the classical counterpart of Tf, [12]. 

Lemma 5.6. Tf, is continuous. 
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The powers of Tp are defined as usual: For all Herbrand states S, define 
Tpl u}{S) as the union of all Tp] n{S) with n < uj, where Tp] 0{S) = S and 
Tp] (n + 1)(S') = Tp(Tp] n{S)) for all n < uj. We use Tp] uj to abbrevi- 
ate Tp] w(0). 

The following theorem shows that the least model state of P coincides with 
the least fixpoint of Tp, and that the least fixpoint is attained at the first limit 
ordinal. This result follows immediately from Lemmata 5.5 and 5.6. 

Theorem 5.7. MS p = IfpiT],) = T],]u;. 

We give an illustrative example. 

Example 5.8. Consider again the positive mvd-program P given in Exam- 
ple 3.5. Its least model state MS p is given by Tp] w = Tp] 3: 

can(Tp] 1) = S*! = {dosed^'^ {r)\/ closed^ (s), road'^'^{r,a,b), road°'^(s, 6, c)}, 
can{Tp]2) = 82 = S'iU{reac/i°'^(a, 6)Vc/osed°'^(r-), reach°'%b,c)Vdosed^'^{s)}, 
can{Tp] 3) = 83 = S 2 U{reach^'^ {a, cjV dosed^'\r)\/ dosed^'^(s)} . 

5.3 Perfect Models under Finite Local Stratification 

We now give an iterative fixpoint characterization of perfect models of mvd- 
programs with finite local stratification. It generalizes the classical counterpart 
in [18]. 

For sets of emvd-clauses P and sets of expanded interpretations P, we define: 

Tp{P) = [J {models p{state p (p)) \ p G P} , 

where modelsp is defined as in Section 5.1 and state p is given by: 

state p{p) = {A^y ■ ■ -\J Af \ {ApJ ■ ■ -y Ai',d^ Bi, . . . , Bm)[c,V\ G ground{P) , 

a = C-1+ min(p(Bi), . . ,,p{Bm)) > d} . 

The following theorem formulates the iterative fixpoint characterization. 

Theorem 5.9. Let P he an mvd-program and let Hi, H 2 , ■ ■ ■ , Hn he a finite loeal 
stratification of P . For pep-interpretations p, we define: 

P,{p) = P,/pU ^1 € HB^, p{A) > o} . 

Then, the set of perfect models of P is given as Pn, where 

P\ = can{Tp^] uj) , 

Pi = U{ca«(T^(p)T ^)\P^ Pi-i} for alii G {2, . . . ,n} . 
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6 Summary and Outlook 

We introduced least model states for many-valued disjunctive logic programs. 
We then showed how to unfold many- valuedness under the semantics of minimal 
models, perfect models, stable models, and least model states. Thus, existing 
technology for classical disjunctive logic programming can be used to implement 
many-valued disjunctive logic programming. Using these results, we gave many- 
valued fixpoint characterizations for the set of all minimal models and the least 
model state. We also gave an iterative fixpoint characterization for the perfect 
model semantics under finite local stratification. 

An interesting topic of future research is to elaborate other semantics for 
many-valued disjunctive logic programs, for example, to define partial stable 
models. Moreover, it would be very interesting to work out fixpoint characteri- 
zations for stable (and partial stable) models. This may be done by generalizing 
the evidential transformation in [2] or the 3-S transformation in [17]. Finally, 
another topic of future research is to elaborate proof theories for the various 
semantics. 
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